AraFashion: A Novel Fashion Captioning Dataset Leveraging Attention-Based EfficientNet and xLSTM

Shams A. Ahmed; Ahmed T. Abdulameer

doi:10.14500/aro.12335

Authors

Shams A. Ahmed Department of Information Technology Management, Technical College of Management-Baghdad, Middle Technical University, Baghdad, Iraq https://orcid.org/0009-0001-3562-7984
Ahmed T. Abdulameer Department of Information Technology Management, Technical College of Management-Baghdad, Middle Technical University, Baghdad, Iraq https://orcid.org/0000-0003-3600-2369

DOI:

https://doi.org/10.14500/aro.12335

Keywords:

Arabic Image Captioning, AraFashion, Dataset, EfficientNetB4, xLSTM

Abstract

The significance of creating models that can produce precise textual descriptions of photographs has become apparent, particularly in specialized domains such as fashion. Arabic suffers from a severe shortage of publicly available resources, particularly fashion picture databases, in contrast to the wealth of databases and studies about the English language. This restricts the creation of Arabic language models and impedes scholarly research in this area. By creating a hybrid model for automatically producing Arabic descriptions of fashion photos, our study seeks to close this gap. Based on the EfficientNet-B4 architecture, this model incorporates an attention mechanism to extract visual features and, for the first time in this field, links it to an xLSTM module for text creation. This study produced a new dataset with Arabic captions called AraFashion; the Arabic descriptions were translated into English through Google Translate. Using real Arabic data improves the model’s accuracy and realism, as seen by the model’s top BLEU-1 score of 0.7335 for Arabic descriptions. This study suggests growing Arabic databases in the fashion industry and highlights the need to support the Arabic language in AI technology.

Downloads

Download data is not yet available.

Author Biographies

Shams A. Ahmed, Department of Information Technology Management, Technical College of Management-Baghdad, Middle Technical University, Baghdad, Iraq

Shams A. Ahmed is a Researcher in the field of Information Technology Management. She received the B.Sc. degree in Information Technology Management and the M.Sc. degree in Information Technology Management from the Technical College of Management-Baghdad, Middle Technical University, Baghdad, Iraq. Her research interests include AI, image processing, data analysis, and image captioning.

Ahmed T. Abdulameer, Department of Information Technology Management, Technical College of Management-Baghdad, Middle Technical University, Baghdad, Iraq

Ahmed T. Abdulameer is a Assistant Professor at the Department of IT, Technical College of Management-Baghdad, Middle Technical University. He received the B.Sc. degree in Computer Science, the M.Sc. degree in Computer Science, and the Ph.D. degree in Computer Science. His research interests include computer vision, image retrieval, and image processing.

References

Al-Malki, R.S., and Al-Aama, A.Y., 2023, Arabic captioning for images of clothing using deep learning. Sensors, 23(8), pp.3783. DOI: https://doi.org/10.3390/s23083783

Al-Malla, M.A., Jafar, A., and Ghneim, N., 2022, Pre-trained CNNs as feature extraction modules for image Captioning: An experimental study. ELCVIA Electronic Letters on Computer Vision and Image Analysis, 21(1), pp.1–16. DOI: https://doi.org/10.5565/rev/elcvia.1436

Anderson, P., Fernando, B., Johnson, M., and Gould, S., 2016. Spice: Semantic Propositional Image Caption Evaluation. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14. Springer, Berlin.

Banerjee, S., and Lavie, A., 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In: Proceedings of the Acl Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.

Beck, M., Pöppel, K., Spanring, M., Auer, A., Prudnikova, O., Kopp, M., Klambauer, G., Brandstetter, J., and Hochreiter, S., 2024 XLSTM: Extended Long Short-Term Memory. [arXiv Preprint]

Cai, C., Yap, K.H., and Wang, S., 2025 Toward attribute-controlled fashion image captioning. ACM Transactions on Multimedia Computing, Communications, and Applications, 20, 280. DOI: https://doi.org/10.1145/3671000

Ibrahim, H.S., Shati, N.M., and Alsewari, A.A., 2024. A transfer learning approach for arabic image captions. Al-Mustansiriyah Journal of Science, 35, pp.81-90. DOI: https://doi.org/10.23851/mjs.v35i3.1485

Lasheen, M.T., and Barakat, N.H., 2022. Arabic image captioning: The effect of text pre-processing on the attention weights and the BLEU-N scores. International Journal of Advanced Computer Science and Applications, 13, pp.413‑423. DOI: https://doi.org/10.14569/IJACSA.2022.0130751

Lin, C.Y., 2004. Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out. Association for Computational Linguistics, Pennsylvania.

Liu, Z., Luo, P., Qiu, S., Wang, X., and Tang, X., 2016. Deepfashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. DOI: https://doi.org/10.1109/CVPR.2016.124

Moratelli, N., Barraco, M., Morelli, D., Cornia, M., Baraldi, L., and Cucchiara, R. 2023. Fashion-oriented image captioning with external knowledge retrieval and fully attentive gates. Sensors (Basel), 23(3), pp.1286. DOI: https://doi.org/10.3390/s23031286

Pan, Y., Yao, T., Li, Y., and Mei, T., 2020. X-Linear Attention Networks for Image Captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. DOI: https://doi.org/10.1109/CVPR42600.2020.01098

Papineni, K., Roukos, S., Ward, T., and Zhu Bleu, W.J., 2002. A Method for Automatic Evaluation of machine Translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. DOI: https://doi.org/10.3115/1073083.1073135

Rawate, S.,Vayadande, K.,Chaudhary, S., Manmode, S., Suryavanshi, R., and Chanda, K., 2022 Fashion Classification model. Techno-Societal 2016. In: International Conference on Advanced Technologies for Societal Applications, Springer.

Rostamzadeh, N., Hosseini, S., Boquet, T., Stokowiec, W., Zhang, Y., Jauvin, C., and Pal, C., 2018. Fashion-Gen: The Generative Fashion Dataset and Challenge [arXiv Preprint].

Ruan, T., and Zhang, S., 2024. Towards Understanding How Attention Mechanism Works in Deep Learning [arXiv Preprint].

Sabri, S.M., 2021. Arabic Image Captioning Using Deep Learning with Attention. University of Georgia, Georgia.

Sameer, M., Talib, A., Hussein, A., and Husni, H., 2023. Arabic speech recognition based on encoder-decoder architecture of transformer. Journal of Techniques, 5, pp.176-183. DOI: https://doi.org/10.51173/jt.v5i1.749

Shams, 2025. AraFashion: A New Dataset for Fashion Caption. Kaggle, San Francisco.

Tan, M., and Le, Q., 2019. Efficientnet: Rethinking Model Scaling for Convolutional Neural Networks. International Conference on Machine Learning, PMLR.

Vedantam, R., Lawrence Zitnick, C., and Parikh, D., 2015. Cider: Consensus Based Image Description Evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. DOI: https://doi.org/10.1109/CVPR.2015.7299087

Xiao, H., Rasul, K., and Vollgraf, R., 2017. Fashion-Mnist: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. [arXiv Preprint].

Yang, X., Zhang, H., Jin, D., Liu, Y., Wu, C.H., Tan, J., Xie, D., Wang, J., and Wang, X., 2020. Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XIII 16, Springer.

AraFashion

A Novel Fashion Captioning Dataset Leveraging Attention-Based EfficientNet and xLSTM

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Shams A. Ahmed, Department of Information Technology Management, Technical College of Management-Baghdad, Middle Technical University, Baghdad, Iraq

Ahmed T. Abdulameer, Department of Information Technology Management, Technical College of Management-Baghdad, Middle Technical University, Baghdad, Iraq

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

aro

calrivate

scopus

scopus citescore

kou

cr

if

doaj

Make a Submission

issn

Information

Current Issue

Developed By

Keywords

Browse