6,011 research outputs found
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
The alignment of heterogeneous sequential data (video to text) is an
important and challenging problem. Standard techniques for this task, including
Dynamic Time Warping (DTW) and Conditional Random Fields (CRFs), suffer from
inherent drawbacks. Mainly, the Markov assumption implies that, given the
immediate past, future alignment decisions are independent of further history.
The separation between similarity computation and alignment decision also
prevents end-to-end training. In this paper, we propose an end-to-end neural
architecture where alignment actions are implemented as moving data between
stacks of Long Short-term Memory (LSTM) blocks. This flexible architecture
supports a large variety of alignment tasks, including one-to-one, one-to-many,
skipping unmatched elements, and (with extensions) non-monotonic alignment.
Extensive experiments on semi-synthetic and real datasets show that our
algorithm outperforms state-of-the-art baselines.Comment: Accepted at CVPR 2018 (Spotlight). arXiv file includes the paper and
the supplemental materia
Simultaneous Localization and Recognition of Dynamic Hand Gestures
A framework for the simultaneous localization and recognition of dynamic hand gestures is proposed. At the core of this framework is a dynamic space-time warping (DSTW) algorithm, that aligns a pair of query and model gestures in both space and time. For every frame of the query sequence, feature detectors generate multiple hand region candidates. Dynamic programming is then used to compute both a global matching cost, which is used to recognize the query gesture, and a warping path, which aligns the query and model sequences in time, and also finds the best hand candidate region in every query frame. The proposed framework includes translation invariant recognition of gestures, a desirable property for many HCI systems. The performance of the approach is evaluated on a dataset of hand signed digits gestured by people wearing short sleeve shirts, in front of a background containing other non-hand skin-colored objects. The algorithm simultaneously localizes the gesturing hand and recognizes the hand-signed digit. Although DSTW is illustrated in a gesture recognition setting, the proposed algorithm is a general method for matching time series, that allows for multiple candidate feature vectors to be extracted at each time step.National Science Foundation (CNS-0202067, IIS-0308213, IIS-0329009); Office of Naval Research (N00014-03-1-0108
Word matching using single closed contours for indexing handwritten historical documents
Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL’04), pp. 278–287, 2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O’Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature
Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package
Dynamic time warping is a popular technique for comparing time series, providing both a distance measure that is insensitive to local compression and stretches and the warping which optimally deforms one of the two input series onto the other. A variety of algorithms and constraints have been discussed in the literature. The dtw package provides an unification of them; it allows R users to compute time series alignments mixing freely a variety of continuity constraints, restriction windows, endpoints, local distance definitions, and so on. The package also provides functions for visualizing alignments and constraints using several classic diagram types.
Continuous Action Recognition Based on Sequence Alignment
Continuous action recognition is more challenging than isolated recognition
because classification and segmentation must be simultaneously carried out. We
build on the well known dynamic time warping (DTW) framework and devise a novel
visual alignment technique, namely dynamic frame warping (DFW), which performs
isolated recognition based on per-frame representation of videos, and on
aligning a test sequence with a model sequence. Moreover, we propose two
extensions which enable to perform recognition concomitant with segmentation,
namely one-pass DFW and two-pass DFW. These two methods have their roots in the
domain of continuous recognition of speech and, to the best of our knowledge,
their extension to continuous visual action recognition has been overlooked. We
test and illustrate the proposed techniques with a recently released dataset
(RAVEL) and with two public-domain datasets widely used in action recognition
(Hollywood-1 and Hollywood-2). We also compare the performances of the proposed
isolated and continuous recognition algorithms with several recently published
methods
Velocity estimation via registration-guided least-squares inversion
This paper introduces an iterative scheme for acoustic model inversion where
the notion of proximity of two traces is not the usual least-squares distance,
but instead involves registration as in image processing. Observed data are
matched to predicted waveforms via piecewise-polynomial warpings, obtained by
solving a nonconvex optimization problem in a multiscale fashion from low to
high frequencies. This multiscale process requires defining low-frequency
augmented signals in order to seed the frequency sweep at zero frequency.
Custom adjoint sources are then defined from the warped waveforms. The proposed
velocity updates are obtained as the migration of these adjoint sources, and
cannot be interpreted as the negative gradient of any given objective function.
The new method, referred to as RGLS, is successfully applied to a few scenarios
of model velocity estimation in the transmission setting. We show that the new
method can converge to the correct model in situations where conventional
least-squares inversion suffers from cycle-skipping and converges to a spurious
model.Comment: 20 pages, 13 figures, 1 tabl
- …