14 research outputs found

    Doodle to Search: Practical Zero-Shot Sketch-based Image Retrieval

    Get PDF
    In this paper, we investigate the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognizes two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended, that consists of 330,000 sketches and 204,000 photos spanning across 110 categories. Highly abstract amateur human sketches are purposefully sourced to maximize the domain gap, instead of ones included in existing datasets that can often be semi-photorealistic. We then formulate a ZS-SBIR framework to jointly model sketches and photos into a common embedding space. A novel strategy to mine the mutual information among domains is specifically engineered to alleviate the domain gap. External semantic knowledge is further embedded to aid semantic transfer. We show that, rather surprisingly, retrieval performance significantly outperforms that of state-of-the-art on existing datasets that can already be achieved using a reduced version of our model. We further demonstrate the superior performance of our full model by comparing with a number of alternatives on the newly proposed dataset. The new dataset, plus all training and testing code of our model, will be publicly released to facilitate future researchComment: Oral paper in CVPR 201

    An Incremental On-line Parsing Algorithm for Recognizing Sketching Diagrams

    Get PDF
    International audienceThis paper presents a syntactic recognition approach for on-line drawn graphical symbols. The proposed method consists in an incremental on-line predictive parser based on symbol descriptions by an adjacency grammar. The parser analyzes input strokes as they are drawn by the user and is able to get ahead which symbols are likely to be recognized when a partial subshape is drawn in an intermediate state. In addition, the parser takes into account two issues. First, symbol strokes are drawn in any order by the user and second, since it is an on-line framework, the system requires real-time response. The method has been applied to an on-line sketching interface for architectural symbols

    TTS: Hilbert Transform-based Generative Adversarial Network for Tattoo and Scene Text Spotting

    Get PDF
    Text spotting in natural scenes is of increasing interest and significance due to its critical role in several applications, such as visual question answering, named entity recognition and event rumor detection on social media. One of the newly emerging challenging problems is Tattoo Text Spotting (TTS) in images for assisting forensic teams and for person identification. Unlike the generally simpler scene text addressed by current state-of-the-art methods, tattoo text is typically characterized by the presence of decorative backgrounds, calligraphic handwriting and several distortions due to the deformable nature of the skin. This paper describes the first approach to address TTS in a real-world application context by designing an end-to-end text spotting method employing a Hilbert transform-based Generative Adversarial Network (GAN). To reduce the complexity of the TTS task, the proposed approach first detects fine details in the image using the Hilbert transform and the Optimum Phase Congruency (OPC). To overcome the challenges of only having a relatively small number of training samples, a GAN is then used for generating suitable text samples and descriptors for text spotting (i.e. both detection and recognition). The superior performance of the proposed TTS approach, for both tattoo and general scene text, over the state-of-the-art methods is demonstrated on a new TTS-specific dataset (publicly available 1) as well as on the existing benchmark natural scene text datasets: Total-Text, CTW1500 and ICDAR 2015

    Optimization of Variable Radius Spiral Micromixer

    No full text
    A novel single-layer passive mixer is designed which takes advantage of gradually increasing the radius of curvature of a spiral micro mixer. [...

    Apport des modèles graphiques à l'analyse et à l'indexation d'images de documents

    No full text
    Cette thèse aborde le problème du manque de performance des outils exploitant des représentationsà base de graphes en reconnaissance des formes. Nous proposons de contribuer aux nouvellesméthodes proposant de tirer partie, à la fois, de la richesse des méthodes structurelles et de la rapidité des méthodes de reconnaissance de formes statistiques. Deux principales contributions sontprésentées dans ce manuscrit. La première correspond à la proposition d'une nouvelle méthode deprojection explicite de graphes procédant par analyse multi-facettes des graphes. Cette méthodeeffectue une caractérisation des graphes suivant différents niveaux qui correspondent, selon nous,aux point-clés des représentations à base de graphes. Il s'agit de capturer l'information portéepar un graphe au niveau global, au niveau structure et au niveau local ou élémentaire. Ces informationscapturées sont encapsulés dans un vecteur de caractéristiques numériques employantdes histogrammes flous. La méthode proposée utilise, de plus, un mécanisme d'apprentissage nonsupervisée pour adapter automatiquement ses paramètres en fonction de la base de graphes àtraiter sans nécessité de phase d'apprentissage préalable. La deuxième contribution correspondà la mise en place d'une architecture pour l'indexation de masses de graphes afin de permettre,par la suite, la recherche de sous-graphes présents dans cette base. Cette architecture utilise laméthode précédente de projection explicite de graphes appliquée sur toutes les cliques d'ordre 2pouvant être extraites des graphes présents dans la base à indexer afin de pouvoir les classifier.Cette classification permet de constituer l'index qui sert de base à la description des graphes etdonc à leur indexation en ne nécessitant aucune base d'apprentissage pré-étiquetées. La méthodeproposée est applicable à de nombreux domaines, apportant la souplesse d'un système de requêtepar l'exemple et la granularité des techniques d'extraction ciblée (focused retrieval).This thesis addresses the problem of lack of efficient computational tools for graph based structuralpattern recognition approaches and proposes to exploit computational strength of statistical patternrecognition. It has two fold contributions. The first contribution is a new method of explicitgraph embedding. The proposed graph embedding method exploits multilevel analysis of graph forextracting graph level information, structural level information and elementary level informationfrom graphs. It embeds this information into a numeric feature vector. The method employs fuzzyoverlapping trapezoidal intervals for addressing the noise sensitivity of graph representations andfor minimizing the information loss while mapping from continuous graph space to discrete vectorspace. The method has unsupervised learning abilities and is capable of automatically adaptingits parameters to underlying graph dataset. The second contribution is a framework for automaticindexing of graph repositories for graph retrieval and subgraph spotting. This framework exploitsexplicit graph embedding for representing the cliques of order 2 by numeric feature vectors, togetherwith classification and clustering tools for automatically indexing a graph repository. It does notrequire a labeled learning set and can be easily deployed to a range of application domains, offeringease of query by example (QBE) and granularity of focused retrieval.TOURS-Bibl.électronique (372610011) / SudocSudocFranceF
    corecore