Search CORE

464 research outputs found

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

Author: Dey Sounak
Dutta Anjan
Ghosh Suman K.
Lladós Josep
Pal Umapada
Valveny Ernest
Publication venue
Publication date: 28/04/2018
Field of study

In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities as well as the the image output modality, learning a common embedding between text and images and between sketches and images. In addition, an attention model is used to selectively focus the attention on the different objects of the image, allowing for retrieval with multiple objects in the query. Experiments show that the proposed method performs the best in both single and multiple object image retrieval in standard datasets.Comment: Accepted at ICPR 201

arXiv.org e-Print Archive

Crossref

Open Research Exeter

Doodle to Search: Practical Zero-Shot Sketch-based Image Retrieval

Author: Dey Sounak
Dutta Anjan
Llados Josep
Riba Pau
Song Yi-Zhe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

In this paper, we investigate the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognizes two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended, that consists of 330,000 sketches and 204,000 photos spanning across 110 categories. Highly abstract amateur human sketches are purposefully sourced to maximize the domain gap, instead of ones included in existing datasets that can often be semi-photorealistic. We then formulate a ZS-SBIR framework to jointly model sketches and photos into a common embedding space. A novel strategy to mine the mutual information among domains is specifically engineered to alleviate the domain gap. External semantic knowledge is further embedded to aid semantic transfer. We show that, rather surprisingly, retrieval performance significantly outperforms that of state-of-the-art on existing datasets that can already be achieved using a reduced version of our model. We further demonstrate the superior performance of our full model by comparing with a number of alternatives on the newly proposed dataset. The new dataset, plus all training and testing code of our model, will be publicly released to facilitate future researchComment: Oral paper in CVPR 201

arXiv.org e-Print Archive

Crossref

University of Surrey

Open Research Exeter

Surrey Research Insight

Deep Shape Matching

Author: A Chalechale
A Gordo
A Khosla
AS Razavian
EJ Crowley
F Radenović
H Tabia
LVD Maaten
M Eitz
P Sangkloy
P Xu
R Hu
S Bai
S Parui
S Wang
S Zhang
Y Kalantidis
Z Xu
Publication venue
Publication date: 25/07/2018
Field of study

We cast shape matching as metric learning with convolutional networks. We break the end-to-end process of image representation into two parts. Firstly, well established efficient methods are chosen to turn the images into edge maps. Secondly, the network is trained with edge maps of landmark images, which are automatically obtained by a structure-from-motion pipeline. The learned representation is evaluated on a range of different tasks, providing improvements on challenging cases of domain generalization, generic sketch-based image retrieval or its fine-grained counterpart. In contrast to other methods that learn a different model per task, object category, or domain, we use the same network throughout all our experiments, achieving state-of-the-art results in multiple benchmarks.Comment: ECCV 201

arXiv.org e-Print Archive

Crossref

Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval

Author: Hospedales Timothy
Song Jifei
Song Yi-Zhe
Xiang Tao
Yu Qian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/12/2017
Field of study

Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instance-level retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments with candidate photos which in turn make subtle visual detail matching difficult. Existing FG-SBIR approaches focus only on coarse holistic matching via deep cross-domain representation learning, yet ignore explicitly accounting for fine-grained details and their spatial context. In this paper, a novel deep FG-SBIR model is proposed which differs significantly from the existing models in that: (1) It is spatially aware, achieved by introducing an attention module that is sensitive to the spatial position of visual details: (2) It combines coarse and fine semantic information via a shortcut connection fusion block: and (3) It models feature correlation and is robust to misalignments between the extracted features across the two domains by introducing a novel higher-order learnable energy function (HOLEF) based loss. Extensive experiments show that the proposed deep spatial-semantic attention model significantly outperforms the state-of-the-art

University of Surrey

Edinburgh Research Explorer

Surrey Research Insight

Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma

Author: Hospedales Timothy
Song Jifei
Song Yi-Zhe
Xiang Tao
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 01/01/2017
Field of study

Crossref

Edinburgh Research Explorer

Deep Learning for Free-Hand Sketch: A Survey

Author: Hospedales Timothy M.
Song Yi-Zhe
Wang Liang
Xiang Tao
Xu Peng
Yin Qiyue
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Free-hand sketches are highly illustrative, and have been widely used by humans to depict objects or stories from ancient times to the present. The recent prevalence of touchscreen devices has made sketch creation a much easier task than ever and consequently made sketch-oriented applications increasingly popular. The progress of deep learning has immensely benefited free-hand sketch research and applications. This paper presents a comprehensive survey of the deep learning techniques oriented at free-hand sketch data, and the applications that they enable. The main contents of this survey include: (i) A discussion of the intrinsic traits and unique challenges of free-hand sketch, to highlight the essential differences between sketch data and other data modalities, e.g., natural photos. (ii) A review of the developments of free-hand sketch research in the deep learning era, by surveying existing datasets, research topics, and the state-of-the-art methods through a detailed taxonomy and experimental evaluation. (iii) Promotion of future work via a discussion of bottlenecks, open problems, and potential research directions for the community.Comment: This paper is accepted by IEEE TPAM

arXiv.org e-Print Archive

Edinburgh Research Explorer

DR-NTU (Digital Repository of NTU)

Towards Practicality of Sketch-Based Visual Understanding

Author: Bhunia Ayan Kumar
Publication venue
Publication date: 26/10/2022
Field of study

Sketches have been used to conceptualise and depict visual objects from pre-historic times. Sketch research has flourished in the past decade, particularly with the proliferation of touchscreen devices. Much of the utilisation of sketch has been anchored around the fact that it can be used to delineate visual concepts universally irrespective of age, race, language, or demography. The fine-grained interactive nature of sketches facilitates the application of sketches to various visual understanding tasks, like image retrieval, image-generation or editing, segmentation, 3D-shape modelling etc. However, sketches are highly abstract and subjective based on the perception of individuals. Although most agree that sketches provide fine-grained control to the user to depict a visual object, many consider sketching a tedious process due to their limited sketching skills compared to other query/support modalities like text/tags. Furthermore, collecting fine-grained sketch-photo association is a significant bottleneck to commercialising sketch applications. Therefore, this thesis aims to progress sketch-based visual understanding towards more practicality.Comment: PhD thesis successfully defended by Ayan Kumar Bhunia, Supervisor: Prof. Yi-Zhe Song, Thesis Examiners: Prof Stella Yu and Prof Adrian Hilto

arXiv.org e-Print Archive