Search CORE

1,353 research outputs found

A Symbol Spotting Approach Based on the Vector Model and a Visual Vocabulary

Author: Alain Boucher
Salvatore Tabbone
Thi-oanh Nguyen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

This paper addresses the difficult problem of symbol spotting for graphic documents. We propose an approach where each graphic document is indexed as a text document by using the vector model and an inverted file structure. The method relies on a visual vocabulary built from a shape descriptor adapted to the document level and invariant under classical geometric transforms (rotation, scaling and translation). Regions of interest selected with high degree of confidence using a voting strategy are considered as occurrences of a query symbol. Experimental results are promising and show the feasibility of our approach

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

HAL-IRD

HAL - UPEC / UPEM

Visually grounded learning of keyword prediction from untranscribed speech

Author: Kamper Herman
Livescu Karen
Settle Shane
Shakhnarovich Gregory
Publication venue
Publication date: 25/05/2017
Field of study

During language acquisition, infants have the benefit of visual cues to ground spoken language. Robots similarly have access to audio and visual sensors. Recent work has shown that images and spoken captions can be mapped into a meaningful common space, allowing images to be retrieved using speech and vice versa. In this setting of images paired with untranscribed spoken captions, we consider whether computer vision systems can be used to obtain textual labels for the speech. Concretely, we use an image-to-words multi-label visual classifier to tag images with soft textual labels, and then train a neural network to map from the speech to these soft targets. We show that the resulting speech system is able to predict which words occur in an utterance---acting as a spoken bag-of-words classifier---without seeing any parallel speech and text. We find that the model often confuses semantically related words, e.g. "man" and "person", making it even more effective as a semantic keyword spotter.Comment: 5 pages, 3 figures, 5 tables; small updates, added link to code; accepted to Interspeech 201

arXiv.org e-Print Archive

Crossref

BoR: Bag-of-Relations for Symbol Retrieval

Author: K.C. Santosh
Lamiroy Bart
Wendling Laurent
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2014
Field of study

International audienceIn this paper, we address a new scheme for symbol retrieval based on bag-of-relations (BoRs) which are computed between extracted visual primitives (e.g. circle and corner). Our features consist of pairwise spatial relations from all possible combinations of individual visual primitives. The key characteristic of the overall process is to use topological relation information indexed in bags-of-relations and use this for recognition. As a consequence, directional relation matching takes place only with those candidates having similar topological configurations. A comprehensive study is made by using several different well known datasets such as GREC, FRESH and SESYD, and includes a comparison with state-of-the-art descriptors. Experiments provide interesting results on symbol spotting and other user-friendly symbol retrieval applications

INRIA a CCSD electronic archive server

HAL Descartes