Search CORE

200 research outputs found

Delving Deep into Fine-Grained Sketch-Based Image Retrieval.

Author: Pang Kaiyue
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2020
Field of study

PhD ThesisTo see is to sketch. Since prehistoric times, people use sketch-like petroglyphs as an effective communicative tool which predates the appearance of language tens of thousands of years ago. This is even more true nowadays that with the ubiquitous proliferation of touchscreen devices, sketching is possibly the only rendering mechanism readily available for all to express visual intentions. The intriguing free-hand property of human sketches, however, becomes a major obstacle when practically applied – humans are not faithful artists, the sketches drawn are iconic abstractions of mental images and can quickly fall off the visual manifold of natural objects. When matching discriminatively with their corresponding photos, this problem is known as finegrained sketch-based image retrieval (FG-SBIR) and has drawn increasing interest due to its potential commercial adoption. This thesis delves deep into FG-SBIR by intuitively analysing the intrinsic unique traits of human sketches and make such understanding importantly leveraged to enhance their links to match with photos under deep learning. More specifically, this thesis investigates and has developed four methods for FG-SBIR as follows: Chapter 3 describes a discriminative-generative hybrid method to better bridge the domain gap between photo and sketch. Existing FG-SBIR models learn a deep joint embedding space with discriminative losses only to pull matching pairs of photos and sketches close and push mismatched pairs away, thus indirectly align the two domains. To this end, we introduce a i generative task of cross-domain image synthesis. Concretely when an input photo is embedded in the joint space, the embedding vector is used as input to a generative model to synthesise the corresponding sketch. This task enforces the learned embedding space to preserve all the domain invariant information that is useful for cross-domain reconstruction, thus explicitly reducing the domain gap as opposed to existing models. Such an approach achieves the first near-human performance on the largest FG-SBIR dataset to date, Sketchy. Chapter 4 presents a new way of modelling human sketch and shows how such modelling can be integrated into existing FG-SBIR paradigm with promising performance. Instead of modelling the forward sketching pass, we attempt to invert it. We model this inversion by translating iconic free-hand sketches to contours that resemble more geometrically realistic projections of object boundaries and separately factorise out the salient added details. This factorised rerepresentation makes it possible for more effective sketch-photo matching. Specifically, we propose a novel unsupervised image style transfer model based on enforcing a cyclic embedding consistency constraint. A deep four-way Siamese model is then formulated to importantly utilise the synthesised contours by extracting distinct complementary detail features for FG-SBIR. Chapter 5 extends the practical applicability of FG-SBIR to work well beyond its training categories. Existing models, while successful, require instance-level pairing within each coarsegrained category as annotated training data, leaving their ability to deal with out-of-sample data unknown. We identify cross-category generalisation for FG-SBIR as a domain generalisation problem and propose the first solution. Our key contribution is a novel unsupervised learning approach to model a universal manifold of prototypical visual sketch traits. This manifold can then be used to paramaterise the learning of a sketch/photo representation. Model adaptation to novel categories then becomes automatic via embedding the novel sketch in the manifold and updating the representation and retrieval function accordingly. Chapter 6 challenges the ImageNet pre-training that has long been considered crucial by the FG-SBIR community due to the lack of large sketch-photo paired datasets for FG-SBIR training, and propose a self-supervised alternative for representation pre-training. Specifically, we consider the jigsaw puzzle game of recomposing images from shuffled parts. We identify two ii key facets of jigsaw task design that are required for effective performance. The first is formulating the puzzle in a mixed-modality fashion. Second we show that framing the optimisation as permutation matrix inference via Sinkhorn iterations is more effective than existing classifier instantiation of the Jigsaw idea. We show for the first time that ImageNet classification is unnecessary as a pre-training strategy for FG-SBIR and confirm the efficacy of our jigsaw approach

Robust statistical frontalization of human and animal faces

Author: A Li
BK Natarajan
C Georgakis
C Sagonas
Christos Sagonas
D Donoho
DP Bertsekas
EJ Candès
F Juefei-Xu
G Tzimiropoulos
HT Ho
I Matthews
JF Cai
JM Saragih
L Vandenberghe
Maja Pantic
PJ Phillips
R Gross
S Arashloo
Stefanos Zafeiriou
X Chai
X Wang
Y Peng
Yannis Panagakis
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The unconstrained acquisition of facial data in real-world conditions may result in face images with significant pose variations, illumination changes, and occlusions, affecting the performance of facial landmark localization and recognition methods. In this paper, a novel method, robust to pose, illumination variations, and occlusions is proposed for joint face frontalization and landmark localization. Unlike the state-of-the-art methods for landmark localization and pose correction, where large amount of manually annotated images or 3D facial models are required, the proposed method relies on a small set of frontal images only. By observing that the frontal facial image of both humans and animals, is the one having the minimum rank of all different poses, a model which is able to jointly recover the frontalized version of the face as well as the facial landmarks is devised. To this end, a suitable optimization problem is solved, concerning minimization of the nuclear norm (convex surrogate of the rank function) and the matrix ℓ1 norm accounting for occlusions. The proposed method is assessed in frontal view reconstruction of human and animal faces, landmark localization, pose-invariant face recognition, face verification in unconstrained conditions, and video inpainting by conducting experiment on 9 databases. The experimental results demonstrate the effectiveness of the proposed method in comparison to the state-of-the-art methods for the target problems

Springer - Publisher Connector

Spiral - Imperial College Digital Repository

University of Twente Research Information