8,862 research outputs found
Vehicle-Rear: A New Dataset to Explore Feature Fusion for Vehicle Identification Using Convolutional Neural Networks
This work addresses the problem of vehicle identification through
non-overlapping cameras. As our main contribution, we introduce a novel dataset
for vehicle identification, called Vehicle-Rear, that contains more than three
hours of high-resolution videos, with accurate information about the make,
model, color and year of nearly 3,000 vehicles, in addition to the position and
identification of their license plates. To explore our dataset we design a
two-stream CNN that simultaneously uses two of the most distinctive and
persistent features available: the vehicle's appearance and its license plate.
This is an attempt to tackle a major problem: false alarms caused by vehicles
with similar designs or by very close license plate identifiers. In the first
network stream, shape similarities are identified by a Siamese CNN that uses a
pair of low-resolution vehicle patches recorded by two different cameras. In
the second stream, we use a CNN for OCR to extract textual information,
confidence scores, and string similarities from a pair of high-resolution
license plate patches. Then, features from both streams are merged by a
sequence of fully connected layers for decision. In our experiments, we
compared the two-stream network against several well-known CNN architectures
using single or multiple vehicle features. The architectures, trained models,
and dataset are publicly available at https://github.com/icarofua/vehicle-rear
Profiling of OCR'ed Historical Texts Revisited
In the absence of ground truth it is not possible to automatically determine
the exact spectrum and occurrences of OCR errors in an OCR'ed text. Yet, for
interactive postcorrection of OCR'ed historical printings it is extremely
useful to have a statistical profile available that provides an estimate of
error classes with associated frequencies, and that points to conjectured
errors and suspicious tokens. The method introduced in Reffle (2013) computes
such a profile, combining lexica, pattern sets and advanced matching techniques
in a specialized Expectation Maximization (EM) procedure. Here we improve this
method in three respects: First, the method in Reffle (2013) is not adaptive:
user feedback obtained by actual postcorrection steps cannot be used to compute
refined profiles. We introduce a variant of the method that is open for
adaptivity, taking correction steps of the user into account. This leads to
higher precision with respect to recognition of erroneous OCR tokens. Second,
during postcorrection often new historical patterns are found. We show that
adding new historical patterns to the linguistic background resources leads to
a second kind of improvement, enabling even higher precision by telling
historical spellings apart from OCR errors. Third, the method in Reffle (2013)
does not make any active use of tokens that cannot be interpreted in the
underlying channel model. We show that adding these uninterpretable tokens to
the set of conjectured errors leads to a significant improvement of the recall
for error detection, at the same time improving precision
Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues
Recognizing scene text is a challenging problem, even more so than the
recognition of scanned documents. This problem has gained significant attention
from the computer vision community in recent years, and several methods based
on energy minimization frameworks and deep learning approaches have been
proposed. In this work, we focus on the energy minimization framework and
propose a model that exploits both bottom-up and top-down cues for recognizing
cropped words extracted from street images. The bottom-up cues are derived from
individual character detections from an image. We build a conditional random
field model on these detections to jointly model the strength of the detections
and the interactions between them. These interactions are top-down cues
obtained from a lexicon-based prior, i.e., language statistics. The optimal
word represented by the text image is obtained by minimizing the energy
function corresponding to the random field model. We evaluate our proposed
algorithm extensively on a number of cropped scene text benchmark datasets,
namely Street View Text, ICDAR 2003, 2011 and 2013 datasets, and IIIT 5K-word,
and show better performance than comparable methods. We perform a rigorous
analysis of all the steps in our approach and analyze the results. We also show
that state-of-the-art convolutional neural network features can be integrated
in our framework to further improve the recognition performance
- …