5,840 research outputs found
Query by String word spotting based on character bi-gram indexing
In this paper we propose a segmentation-free query by string word spotting
method. Both the documents and query strings are encoded using a recently
proposed word representa- tion that projects images and strings into a common
atribute space based on a pyramidal histogram of characters(PHOC). These
attribute models are learned using linear SVMs over the Fisher Vector
representation of the images along with the PHOC labels of the corresponding
strings. In order to search through the whole page, document regions are
indexed per character bi- gram using a similar attribute representation. On top
of that, we propose an integral image representation of the document using a
simplified version of the attribute model for efficient computation. Finally we
introduce a re-ranking step in order to boost retrieval performance. We show
state-of-the-art results for segmentation-free query by string word spotting in
single-writer and multi-writer standard datasetsComment: To be published in ICDAR201
Automatic Palaeographic Exploration of Genizah Manuscripts
The Cairo Genizah is a collection of hand-written documents containing approximately
350,000 fragments of mainly Jewish texts discovered in the late 19th
century. The
fragments are today spread out in some 75 libraries and private collections worldwide,
but there is an ongoing effort to document and catalogue all extant fragments.
Palaeographic information plays a key role in the study of the Genizah collection.
Script style, and–more specifically–handwriting, can be used to identify fragments that
might originate from the same original work. Such matched fragments, commonly
referred to as “joins”, are currently identified manually by experts, and presumably only
a small fraction of existing joins have been discovered to date. In this work, we show
that automatic handwriting matching functions, obtained from non-specific features
using a corpus of writing samples, can perform this task quite reliably. In addition, we
explore the problem of grouping various Genizah documents by script style, without
being provided any prior information about the relevant styles. The automatically
obtained grouping agrees, for the most part, with the palaeographic taxonomy. In cases
where the method fails, it is due to apparent similarities between related scripts
Exploiting saliency for object segmentation from image level labels
There have been remarkable improvements in the semantic labelling task in the
recent years. However, the state of the art methods rely on large-scale
pixel-level annotations. This paper studies the problem of training a
pixel-wise semantic labeller network from image-level annotations of the
present object classes. Recently, it has been shown that high quality seeds
indicating discriminative object regions can be obtained from image-level
labels. Without additional information, obtaining the full extent of the object
is an inherently ill-posed problem due to co-occurrences. We propose using a
saliency model as additional information and hereby exploit prior knowledge on
the object extent and image statistics. We show how to combine both information
sources in order to recover 80% of the fully supervised performance - which is
the new state of the art in weakly supervised training for pixel-wise semantic
labelling. The code is available at https://goo.gl/KygSeb.Comment: CVPR 201
Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers
Scene parsing, or semantic segmentation, consists in labeling each pixel in
an image with the category of the object it belongs to. It is a challenging
task that involves the simultaneous detection, segmentation and recognition of
all the objects in the image.
The scene parsing method proposed here starts by computing a tree of segments
from a graph of pixel dissimilarities. Simultaneously, a set of dense feature
vectors is computed which encodes regions of multiple sizes centered on each
pixel. The feature extractor is a multiscale convolutional network trained from
raw pixels. The feature vectors associated with the segments covered by each
node in the tree are aggregated and fed to a classifier which produces an
estimate of the distribution of object categories contained in the segment. A
subset of tree nodes that cover the image are then selected so as to maximize
the average "purity" of the class distributions, hence maximizing the overall
likelihood that each segment will contain a single object. The convolutional
network feature extractor is trained end-to-end from raw pixels, alleviating
the need for engineered features. After training, the system is parameter free.
The system yields record accuracies on the Stanford Background Dataset (8
classes), the Sift Flow Dataset (33 classes) and the Barcelona Dataset (170
classes) while being an order of magnitude faster than competing approaches,
producing a 320 \times 240 image labeling in less than 1 second.Comment: 9 pages, 4 figures - Published in 29th International Conference on
Machine Learning (ICML 2012), Jun 2012, Edinburgh, United Kingdo
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer
Semantic annotations are vital for training models for object recognition,
semantic segmentation or scene understanding. Unfortunately, pixelwise
annotation of images at very large scale is labor-intensive and only little
labeled data is available, particularly at instance level and for street
scenes. In this paper, we propose to tackle this problem by lifting the
semantic instance labeling task from 2D into 3D. Given reconstructions from
stereo or laser data, we annotate static 3D scene elements with rough bounding
primitives and develop a model which transfers this information into the image
domain. We leverage our method to obtain 2D labels for a novel suburban video
dataset which we have collected, resulting in 400k semantic and instance image
annotations. A comparison of our method to state-of-the-art label transfer
baselines reveals that 3D information enables more efficient annotation while
at the same time resulting in improved accuracy and time-coherent labels.Comment: 10 pages in Conference on Computer Vision and Pattern Recognition
(CVPR), 201
- …