Search CORE

3 research outputs found

Revisiting Cross Modal Retrieval

Author: Calefati Alessandro
Gallo Ignazio
Janjua Muhammad Kamran
Nawaz Shah
Publication venue
Publication date: 19/07/2018
Field of study

This paper proposes a cross-modal retrieval system that leverages on image and text encoding. Most multimodal architectures employ separate networks for each modality to capture the semantic relationship between them. However, in our work image-text encoding can achieve comparable results in terms of cross-modal retrieval without having to use a separate network for each modality. We show that text encodings can capture semantic relationships between multiple modalities. In our knowledge, this work is the first of its kind in terms of employing a single network and fused image-text embedding for cross-modal retrieval. We evaluate our approach on two famous multimodal datasets: MS-COCO and Flickr30K.Comment: 14 pages. Under review at ECCVW (MULA 2018

arXiv.org e-Print Archive

Multitask Text-to-Visual Embedding with Titles and Clickthrough Data

Author: Aggarwal Pranav
Faieta Baldo
Lin Zhe
Motiian Saeid
Publication venue
Publication date: 30/05/2019
Field of study

Text-visual (or called semantic-visual) embedding is a central problem in vision-language research. It typically involves mapping of an image and a text description to a common feature space through a CNN image encoder and a RNN language encoder. In this paper, we propose a new method for learning text-visual embedding using both image titles and click-through data from an image search engine. We also propose a new triplet loss function by modeling positive awareness of the embedding, and introduce a novel mini-batch-based hard negative sampling approach for better data efficiency in the learning process. Experimental results show that our proposed method outperforms existing methods, and is also effective for real-world text-to-visual retrieval.Comment: 4 pages. Language and Vision Workshop, in conjunction with CVPR 201

arXiv.org e-Print Archive

Learning Inward Scaled Hypersphere Embedding: Exploring Projections in Higher Dimensions

Author: Janjua Muhammad Kamran
Nawaz Shah
Calefati Alessandro
Gallo Ignazio
Publication venue
Publication date: 01/01/2018
Field of study

Majority of the current dimensionality reduction or retrieval techniques rely on embedding the learned feature representations onto a computable metric space. Once the learned features are mapped, a distance metric aids the bridging of gaps between similar instances. Since the scaled projection is not exploited in these methods, discriminative embedding onto a hyperspace becomes a challenge. In this paper, we propose to inwardly scale feature representations in proportional to projecting them onto a hypersphere manifold for discriminative analysis. We further propose a novel, yet simpler, convolutional neural network based architecture and extensively evaluate the proposed methodology in the context of classification and retrieval tasks obtaining results comparable to state-of-the-art techniques

arXiv.org e-Print Archive

Repositorio institucional - Universidad Señor de Sipán

Registro Nacional de Trabajos de Investigación y Proyectos