371 research outputs found
Cost-Effective HITs for Relative Similarity Comparisons
Similarity comparisons of the form "Is object a more similar to b than to c?"
are useful for computer vision and machine learning applications.
Unfortunately, an embedding of points is specified by triplets,
making collecting every triplet an expensive task. In noticing this difficulty,
other researchers have investigated more intelligent triplet sampling
techniques, but they do not study their effectiveness or their potential
drawbacks. Although it is important to reduce the number of collected triplets,
it is also important to understand how best to display a triplet collection
task to a user. In this work we explore an alternative display for collecting
triplets and analyze the monetary cost and speed of the display. We propose
best practices for creating cost effective human intelligence tasks for
collecting triplets. We show that rather than changing the sampling algorithm,
simple changes to the crowdsourcing UI can lead to much higher quality
embeddings. We also provide a dataset as well as the labels collected from
crowd workers.Comment: 7 pages, 7 figure
Overcomplete steerable pyramid filters and rotation invariance
A given (overcomplete) discrete oriented pyramid may be converted into a steerable pyramid by interpolation. We present a technique for deriving the optimal interpolation functions (otherwise called 'steering coefficients'). The proposed scheme is demonstrated on a computationally efficient oriented pyramid, which is a variation on the Burt and Adelson (1983) pyramid. We apply the generated steerable pyramid to orientation-invariant texture analysis in order to demonstrate its excellent rotational isotropy. High classification rates and precise rotation identification are demonstrated
Enhanced decoding for the Galileo S-band mission
A coding system under consideration for the Galileo S-band low-gain antenna mission is a concatenated system using a variable redundancy Reed-Solomon outer code and a (14,1/4) convolutional inner code. The 8-bit Reed-Solomon symbols are interleaved to depth 8, and the eight 255-symbol codewords in each interleaved block have redundancies 64, 20, 20, 20, 64, 20, 20, and 20, respectively (or equivalently, the codewords have 191, 235, 235, 235, 191, 235, 235, and 235 8-bit information symbols, respectively). This concatenated code is to be decoded by an enhanced decoder that utilizes a maximum likelihood (Viterbi) convolutional decoder; a Reed Solomon decoder capable of processing erasures; an algorithm for declaring erasures in undecoded codewords based on known erroneous symbols in neighboring decodable words; a second Viterbi decoding operation (redecoding) constrained to follow only paths consistent with the known symbols from previously decodable Reed-Solomon codewords; and a second Reed-Solomon decoding operation using the output from the Viterbi redecoder and additional erasure declarations to the extent possible. It is estimated that this code and decoder can achieve a decoded bit error rate of 1 x 10(exp 7) at a concatenated code signal-to-noise ratio of 0.76 dB. By comparison, a threshold of 1.17 dB is required for a baseline coding system consisting of the same (14,1/4) convolutional code, a (255,223) Reed-Solomon code with constant redundancy 32 also interleaved to depth 8, a one-pass Viterbi decoder, and a Reed Solomon decoder incapable of declaring or utilizing erasures. The relative gain of the enhanced system is thus 0.41 dB. It is predicted from analysis based on an assumption of infinite interleaving that the coding gain could be further improved by approximately 0.2 dB if four stages of Viterbi decoding and four levels of Reed-Solomon redundancy are permitted. Confirmation of this effect and specification of the optimum four-level redundancy profile for depth-8 interleaving is currently being done
Enhanced decoding for the Galileo low-gain antenna mission: Viterbi redecoding with four decoding stages
The Galileo low-gain antenna mission will be supported by a coding system that uses a (14,1/4) inner convolutional code concatenated with Reed-Solomon codes of four different redundancies. Decoding for this code is designed to proceed in four distinct stages of Viterbi decoding followed by Reed-Solomon decoding. In each successive stage, the Reed-Solomon decoder only tries to decode the highest redundancy codewords not yet decoded in previous stages, and the Viterbi decoder redecodes its data utilizing the known symbols from previously decoded Reed-Solomon codewords. A previous article analyzed a two-stage decoding option that was not selected by Galileo. The present article analyzes the four-stage decoding scheme and derives the near-optimum set of redundancies selected for use by Galileo. The performance improvements relative to one- and two-stage decoding systems are evaluated
Many-to-Many Graph Matching: a Continuous Relaxation Approach
Graphs provide an efficient tool for object representation in various
computer vision applications. Once graph-based representations are constructed,
an important question is how to compare graphs. This problem is often
formulated as a graph matching problem where one seeks a mapping between
vertices of two graphs which optimally aligns their structure. In the classical
formulation of graph matching, only one-to-one correspondences between vertices
are considered. However, in many applications, graphs cannot be matched
perfectly and it is more interesting to consider many-to-many correspondences
where clusters of vertices in one graph are matched to clusters of vertices in
the other graph. In this paper, we formulate the many-to-many graph matching
problem as a discrete optimization problem and propose an approximate algorithm
based on a continuous relaxation of the combinatorial problem. We compare our
method with other existing methods on several benchmark computer vision
datasets.Comment: 1
A robust braille recognition system
Braille is the most effective means of written communication between
visually-impaired and sighted people. This paper describes a new system
that recognizes Braille characters in scanned Braille document pages. Unlike
most other approaches, an inexpensive flatbed scanner is used and the system
requires minimal interaction with the user. A unique feature of this system is
the use of context at different levels (from the pre-processing of the image
through to the post-processing of the recognition results) to enhance robustness
and, consequently, recognition results. Braille dots composing characters are
identified on both single and double-sided documents of average quality with
over 99% accuracy, while Braille characters are also correctly recognised in
over 99% of documents of average quality (in both single and double-sided
documents)
Word matching using single closed contours for indexing handwritten historical documents
Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL’04), pp. 278–287, 2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O’Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature
Real-Time Hand Shape Classification
The problem of hand shape classification is challenging since a hand is
characterized by a large number of degrees of freedom. Numerous shape
descriptors have been proposed and applied over the years to estimate and
classify hand poses in reasonable time. In this paper we discuss our parallel
framework for real-time hand shape classification applicable in real-time
applications. We show how the number of gallery images influences the
classification accuracy and execution time of the parallel algorithm. We
present the speedup and efficiency analyses that prove the efficacy of the
parallel implementation. Noteworthy, different methods can be used at each step
of our parallel framework. Here, we combine the shape contexts with the
appearance-based techniques to enhance the robustness of the algorithm and to
increase the classification score. An extensive experimental study proves the
superiority of the proposed approach over existing state-of-the-art methods.Comment: 11 page
Face analysis using curve edge maps
This paper proposes an automatic and real-time system for face analysis, usable in visual communication applications. In this approach, faces are represented with Curve Edge Maps, which are collections of polynomial segments with a convex region. The segments are extracted from edge pixels using an adaptive incremental linear-time fitting algorithm, which is based on constructive polynomial fitting. The face analysis system considers face tracking, face recognition and facial feature detection, using Curve Edge Maps driven by histograms of intensities and histograms of relative positions. When applied to different face databases and video sequences, the average face recognition rate is 95.51%, the average facial feature detection rate is 91.92% and the accuracy in location of the facial features is 2.18% in terms of the size of the face, which is comparable with or better than the results in literature. However, our method has the advantages of simplicity, real-time performance and extensibility to the different aspects of face analysis, such as recognition of facial expressions and talking
Topological descriptors for 3D surface analysis
We investigate topological descriptors for 3D surface analysis, i.e. the
classification of surfaces according to their geometric fine structure. On a
dataset of high-resolution 3D surface reconstructions we compute persistence
diagrams for a 2D cubical filtration. In the next step we investigate different
topological descriptors and measure their ability to discriminate structurally
different 3D surface patches. We evaluate their sensitivity to different
parameters and compare the performance of the resulting topological descriptors
to alternative (non-topological) descriptors. We present a comprehensive
evaluation that shows that topological descriptors are (i) robust, (ii) yield
state-of-the-art performance for the task of 3D surface analysis and (iii)
improve classification performance when combined with non-topological
descriptors.Comment: 12 pages, 3 figures, CTIC 201
- …