AraFashion: A Novel Fashion Captioning Dataset Leveraging Attention-Based EfficientNet and xLSTM

Abstract

The significance of creating models that can produce precise textual descriptions of photographs has become apparent, particularly in specialized domains such as fashion. Arabic suffers from a severe shortage of publicly available resources, particularly fashion picture databases, in contrast to the wealth of databases and studies about the English language. This restricts the creation of Arabic language models and impedes scholarly research in this area. By creating a hybrid model for automatically producing Arabic descriptions of fashion photos, our study seeks to close this gap. Based on the EfficientNet-B4 architecture, this model incorporates an attention mechanism to extract visual features and, for the first time in this field, links it to an xLSTM module for text creation. This study produced a new dataset with Arabic captions called AraFashion; the Arabic descriptions were translated into English through Google Translate. Using real Arabic data improves the model’s accuracy and realism, as seen by the model’s top BLEU-1 score of 0.7335 for Arabic descriptions. This study suggests growing Arabic databases in the fashion industry and highlights the need to support the Arabic language in AI technology

Similar works

Full text

ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY

redirect
Last time updated on 19/04/2026

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: https://creativecommons.org/licenses/by-nc-sa/4.0