Search CORE

6,427 research outputs found

Channel and spatial attention mechanism for fashion image captioning

Author: Nguyen Bao T.
Nguyen Son T.
Vo Anh H.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2023
Field of study

Image captioning aims to automatically generate one or more description sentences for a given input image. Most of the existing captioning methods use encoder-decoder model which mainly focus on recognizing and capturing the relationship between objects appearing in the input image. However, when generating captions for fashion images, it is important to not only describe the items and their relationships, but also mention attribute features of clothes (shape, texture, style, fabric, and more). In this study, one novel model is proposed for fashion image captioning task which can capture not only the items and their relationship, but also their attribute features. Two different attention mechanisms (spatial-attention and channel-wise attention) is incorporated to the traditional encoder-decoder model, which dynamically interprets the caption sentence in multi-layer feature map in addition to the depth dimension of the feature map. We evaluate our proposed architecture on Fashion-Gen using three different metrics (CIDEr, ROUGE-L, and BLEU-1), and achieve the scores of 89.7, 50.6 and 45.6, respectively. Based on experiments, our proposed method shows significant performance improvement for the task of fashion-image captioning, and outperforms other state-of-the-art image captioning methods

Institute of Advanced Engineering and Science

Improving Domain Generalization by Learning without Forgetting: Application in Retail Checkout

Author: Nguyen Son T.
Nguyen Thuy C.
Phan Nam LH.
Publication venue
Publication date: 12/07/2022
Field of study

Designing an automatic checkout system for retail stores at the human level accuracy is challenging due to similar appearance products and their various poses. This paper addresses the problem by proposing a method with a two-stage pipeline. The first stage detects class-agnostic items, and the second one is dedicated to classify product categories. We also track the objects across video frames to avoid duplicated counting. One major challenge is the domain gap because the models are trained on synthetic data but tested on the real images. To reduce the error gap, we adopt domain generalization methods for the first-stage detector. In addition, model ensemble is used to enhance the robustness of the 2nd-stage classifier. The method is evaluated on the AI City challenge 2022 -- Track 4 and gets the F1 score

40\%

on the test A set. Code is released at the link https://github.com/cybercore-co-ltd/aicity22-track4

arXiv.org e-Print Archive