Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

Dey, S; Dutta, A; Ghosh, SK; Llados, J; Pal, U; Valveny, E

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

Authors: S Dey
A Dutta
SK Ghosh
J Llados
U Pal
E Valveny
Publication date: 16 October 2019
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Doi

Abstract

This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordIn this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities as well as the the image output modality, learning a common embedding between text and images and between sketches and images. In addition, an attention model is used to selectively focus the attention on the different objects of the image, allowing for retrieval with multiple objects in the query. Experiments show that the proposed method performs the best in both single and multiple object image retrieval in standard datasets.European Union Horizon 2020CERCA Programme/Generalitat de Cataluny

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Crossref

Last time updated on 10/08/2021

Supporting member

Open Research Exeter

oai:ore.exeter.ac.uk:10871/392...

Last time updated on 21/10/2019