14 research outputs found
Doodle to Search: Practical Zero-Shot Sketch-based Image Retrieval
In this paper, we investigate the problem of zero-shot sketch-based image
retrieval (ZS-SBIR), where human sketches are used as queries to conduct
retrieval of photos from unseen categories. We importantly advance prior arts
by proposing a novel ZS-SBIR scenario that represents a firm step forward in
its practical application. The new setting uniquely recognizes two important
yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap
between amateur sketch and photo, and (ii) the necessity for moving towards
large-scale retrieval. We first contribute to the community a novel ZS-SBIR
dataset, QuickDraw-Extended, that consists of 330,000 sketches and 204,000
photos spanning across 110 categories. Highly abstract amateur human sketches
are purposefully sourced to maximize the domain gap, instead of ones included
in existing datasets that can often be semi-photorealistic. We then formulate a
ZS-SBIR framework to jointly model sketches and photos into a common embedding
space. A novel strategy to mine the mutual information among domains is
specifically engineered to alleviate the domain gap. External semantic
knowledge is further embedded to aid semantic transfer. We show that, rather
surprisingly, retrieval performance significantly outperforms that of
state-of-the-art on existing datasets that can already be achieved using a
reduced version of our model. We further demonstrate the superior performance
of our full model by comparing with a number of alternatives on the newly
proposed dataset. The new dataset, plus all training and testing code of our
model, will be publicly released to facilitate future researchComment: Oral paper in CVPR 201
An Incremental On-line Parsing Algorithm for Recognizing Sketching Diagrams
International audienceThis paper presents a syntactic recognition approach for on-line drawn graphical symbols. The proposed method consists in an incremental on-line predictive parser based on symbol descriptions by an adjacency grammar. The parser analyzes input strokes as they are drawn by the user and is able to get ahead which symbols are likely to be recognized when a partial subshape is drawn in an intermediate state. In addition, the parser takes into account two issues. First, symbol strokes are drawn in any order by the user and second, since it is an on-line framework, the system requires real-time response. The method has been applied to an on-line sketching interface for architectural symbols
TTS: Hilbert Transform-based Generative Adversarial Network for Tattoo and Scene Text Spotting
Text spotting in natural scenes is of increasing interest and significance due to its critical role in several applications, such as visual question answering, named entity recognition and event rumor detection on social media. One of the newly emerging challenging problems is Tattoo Text Spotting (TTS) in images for assisting forensic teams and for person identification. Unlike the generally simpler scene text addressed by current state-of-the-art methods, tattoo text is typically characterized by the presence of decorative backgrounds, calligraphic handwriting and several distortions due to the deformable nature of the skin. This paper describes the first approach to address TTS in a real-world application context by designing an end-to-end text spotting method employing a Hilbert transform-based Generative Adversarial Network (GAN). To reduce the complexity of the TTS task, the proposed approach first detects fine details in the image using the Hilbert transform and the Optimum Phase Congruency (OPC). To overcome the challenges of only having a relatively small number of training samples, a GAN is then used for generating suitable text samples and descriptors for text spotting (i.e. both detection and recognition). The superior performance of the proposed TTS approach, for both tattoo and general scene text, over the state-of-the-art methods is demonstrated on a new TTS-specific dataset (publicly available 1) as well as on the existing benchmark natural scene text datasets: Total-Text, CTW1500 and ICDAR 2015
Optimization of Variable Radius Spiral Micromixer
A novel single-layer passive mixer is designed which takes advantage of gradually increasing the radius of curvature of a spiral micro mixer. [...
Apport des modèles graphiques à l'analyse et à l'indexation d'images de documents
Cette thèse aborde le problème du manque de performance des outils exploitant des représentationsà base de graphes en reconnaissance des formes. Nous proposons de contribuer aux nouvellesméthodes proposant de tirer partie, à la fois, de la richesse des méthodes structurelles et de la rapidité des méthodes de reconnaissance de formes statistiques. Deux principales contributions sontprésentées dans ce manuscrit. La première correspond à la proposition d'une nouvelle méthode deprojection explicite de graphes procédant par analyse multi-facettes des graphes. Cette méthodeeffectue une caractérisation des graphes suivant différents niveaux qui correspondent, selon nous,aux point-clés des représentations à base de graphes. Il s'agit de capturer l'information portéepar un graphe au niveau global, au niveau structure et au niveau local ou élémentaire. Ces informationscapturées sont encapsulés dans un vecteur de caractéristiques numériques employantdes histogrammes flous. La méthode proposée utilise, de plus, un mécanisme d'apprentissage nonsupervisée pour adapter automatiquement ses paramètres en fonction de la base de graphes àtraiter sans nécessité de phase d'apprentissage préalable. La deuxième contribution correspondà la mise en place d'une architecture pour l'indexation de masses de graphes afin de permettre,par la suite, la recherche de sous-graphes présents dans cette base. Cette architecture utilise laméthode précédente de projection explicite de graphes appliquée sur toutes les cliques d'ordre 2pouvant être extraites des graphes présents dans la base à indexer afin de pouvoir les classifier.Cette classification permet de constituer l'index qui sert de base à la description des graphes etdonc à leur indexation en ne nécessitant aucune base d'apprentissage pré-étiquetées. La méthodeproposée est applicable à de nombreux domaines, apportant la souplesse d'un système de requêtepar l'exemple et la granularité des techniques d'extraction ciblée (focused retrieval).This thesis addresses the problem of lack of efficient computational tools for graph based structuralpattern recognition approaches and proposes to exploit computational strength of statistical patternrecognition. It has two fold contributions. The first contribution is a new method of explicitgraph embedding. The proposed graph embedding method exploits multilevel analysis of graph forextracting graph level information, structural level information and elementary level informationfrom graphs. It embeds this information into a numeric feature vector. The method employs fuzzyoverlapping trapezoidal intervals for addressing the noise sensitivity of graph representations andfor minimizing the information loss while mapping from continuous graph space to discrete vectorspace. The method has unsupervised learning abilities and is capable of automatically adaptingits parameters to underlying graph dataset. The second contribution is a framework for automaticindexing of graph repositories for graph retrieval and subgraph spotting. This framework exploitsexplicit graph embedding for representing the cliques of order 2 by numeric feature vectors, togetherwith classification and clustering tools for automatically indexing a graph repository. It does notrequire a labeled learning set and can be easily deployed to a range of application domains, offeringease of query by example (QBE) and granularity of focused retrieval.TOURS-Bibl.électronique (372610011) / SudocSudocFranceF