6,011 research outputs found

    A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)

    Full text link
    The alignment of heterogeneous sequential data (video to text) is an important and challenging problem. Standard techniques for this task, including Dynamic Time Warping (DTW) and Conditional Random Fields (CRFs), suffer from inherent drawbacks. Mainly, the Markov assumption implies that, given the immediate past, future alignment decisions are independent of further history. The separation between similarity computation and alignment decision also prevents end-to-end training. In this paper, we propose an end-to-end neural architecture where alignment actions are implemented as moving data between stacks of Long Short-term Memory (LSTM) blocks. This flexible architecture supports a large variety of alignment tasks, including one-to-one, one-to-many, skipping unmatched elements, and (with extensions) non-monotonic alignment. Extensive experiments on semi-synthetic and real datasets show that our algorithm outperforms state-of-the-art baselines.Comment: Accepted at CVPR 2018 (Spotlight). arXiv file includes the paper and the supplemental materia

    Simultaneous Localization and Recognition of Dynamic Hand Gestures

    Full text link
    A framework for the simultaneous localization and recognition of dynamic hand gestures is proposed. At the core of this framework is a dynamic space-time warping (DSTW) algorithm, that aligns a pair of query and model gestures in both space and time. For every frame of the query sequence, feature detectors generate multiple hand region candidates. Dynamic programming is then used to compute both a global matching cost, which is used to recognize the query gesture, and a warping path, which aligns the query and model sequences in time, and also finds the best hand candidate region in every query frame. The proposed framework includes translation invariant recognition of gestures, a desirable property for many HCI systems. The performance of the approach is evaluated on a dataset of hand signed digits gestured by people wearing short sleeve shirts, in front of a background containing other non-hand skin-colored objects. The algorithm simultaneously localizes the gesturing hand and recognizes the hand-signed digit. Although DSTW is illustrated in a gesture recognition setting, the proposed algorithm is a general method for matching time series, that allows for multiple candidate feature vectors to be extracted at each time step.National Science Foundation (CNS-0202067, IIS-0308213, IIS-0329009); Office of Naval Research (N00014-03-1-0108

    Word matching using single closed contours for indexing handwritten historical documents

    Get PDF
    Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL’04), pp. 278–287, 2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O’Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature

    Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package

    Get PDF
    Dynamic time warping is a popular technique for comparing time series, providing both a distance measure that is insensitive to local compression and stretches and the warping which optimally deforms one of the two input series onto the other. A variety of algorithms and constraints have been discussed in the literature. The dtw package provides an unification of them; it allows R users to compute time series alignments mixing freely a variety of continuity constraints, restriction windows, endpoints, local distance definitions, and so on. The package also provides functions for visualizing alignments and constraints using several classic diagram types.

    Continuous Action Recognition Based on Sequence Alignment

    Get PDF
    Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried out. We build on the well known dynamic time warping (DTW) framework and devise a novel visual alignment technique, namely dynamic frame warping (DFW), which performs isolated recognition based on per-frame representation of videos, and on aligning a test sequence with a model sequence. Moreover, we propose two extensions which enable to perform recognition concomitant with segmentation, namely one-pass DFW and two-pass DFW. These two methods have their roots in the domain of continuous recognition of speech and, to the best of our knowledge, their extension to continuous visual action recognition has been overlooked. We test and illustrate the proposed techniques with a recently released dataset (RAVEL) and with two public-domain datasets widely used in action recognition (Hollywood-1 and Hollywood-2). We also compare the performances of the proposed isolated and continuous recognition algorithms with several recently published methods

    Velocity estimation via registration-guided least-squares inversion

    Get PDF
    This paper introduces an iterative scheme for acoustic model inversion where the notion of proximity of two traces is not the usual least-squares distance, but instead involves registration as in image processing. Observed data are matched to predicted waveforms via piecewise-polynomial warpings, obtained by solving a nonconvex optimization problem in a multiscale fashion from low to high frequencies. This multiscale process requires defining low-frequency augmented signals in order to seed the frequency sweep at zero frequency. Custom adjoint sources are then defined from the warped waveforms. The proposed velocity updates are obtained as the migration of these adjoint sources, and cannot be interpreted as the negative gradient of any given objective function. The new method, referred to as RGLS, is successfully applied to a few scenarios of model velocity estimation in the transmission setting. We show that the new method can converge to the correct model in situations where conventional least-squares inversion suffers from cycle-skipping and converges to a spurious model.Comment: 20 pages, 13 figures, 1 tabl
    corecore