23 research outputs found

    Human Movement Analysis: Ballistic Dynamics, and Edge Continuity for Pose Estimation

    Get PDF
    We present two contributions to human movement analysis: (a) a ballistic dynamical model for recognizing movements, and (b) a model for coupling edge continuity with contour matching. We describe a Bayesian approach for visual analysis of ballistic hand movements, namely reaches and strikes. These movements are most commonly used for interacting with objects and the environment. One of the key challenges to recognizing them is the variability of the target-location of the hand~- people can reach above their heads, for something on the floor, etc. Our approach recognizes them independent of the movement's target-location and direction by modelling the ballistic dynamics. A video sequence is automatically segmented into ballistic subsequences without tracking the hands. The segments are then classified into strike and reach movements based on low-level motion features. Each ballistic segment is further analyzed to compute qualitative labels for the movement's target-location and direction. Tests are presented with a set of reach and strike movement sequences. We present an approach for whole-body pose contour matching. Contour matching in natural images in the absence of foreground-background segmentation is difficult. Usually an asymmetric approach is adopted, where a contour is said to match well if it aligns with a subset of the image's gradients. This leads to problems as the contour can match with a portion of an object's outline and ignore the remainder. We present a model for using edge-continuity to address this issue. Pairs of edge elements in the image are linked with affinities if they are likely to belong to the same object. A contour that matches with a set of image gradients is constrained to also match with other gradients having high affinities with the chosen ones. A Markov Random Field framework is employed to couple edge continuity and contour matching into a joint optimization process. The approach is illustrated with applications to pose estimation and human detection

    Accurate Detection of Wake Word Start and End Using a CNN

    Full text link
    Small footprint embedded devices require keyword spotters (KWS) with small model size and detection latency for enabling voice assistants. Such a keyword is often referred to as \textit{wake word} as it is used to wake up voice assistant enabled devices. Together with wake word detection, accurate estimation of wake word endpoints (start and end) is an important task of KWS. In this paper, we propose two new methods for detecting the endpoints of wake words in neural KWS that use single-stage word-level neural networks. Our results show that the new techniques give superior accuracy for detecting wake words' endpoints of up to 50 msec standard error versus human annotations, on par with the conventional Acoustic Model plus HMM forced alignment. To our knowledge, this is the first study of wake word endpoints detection methods for single-stage neural KWS.Comment: Proceedings of INTERSPEEC

    Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting

    Full text link
    We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance. Our experimental results show that LSTM models trained using cross-entropy loss or max-pooling loss outperform a cross-entropy loss trained baseline feed-forward Deep Neural Network (DNN). In addition, max-pooling loss trained LSTM with randomly initialized network performs better compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields 67.6%67.6\% relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure

    Source Error-Projection for Sample Selection in Phrase-Based SMT for Resource-Poor Languages

    Get PDF
    Abstract The unavailability of parallel training corpora in resource-poor languages is a major bottleneck in cost-effective and rapid deployment of statistical machine translation (SMT) technology. This has spurred significant interest in active learning for SMT to select the most informative samples from a large candidate pool. This is especially challenging when irrelevant outliers dominate the pool. We propose two supervised sample selection methods, viz. greedy selection and integer linear programming (ILP), based on a novel measure of benefit derived from error analysis. These methods support the selection of diverse and high-impact, yet relevant batches of source sentences. Comparative experiments on multiple test sets across two resource-poor language pairs (English-Pashto and English-Dari) reveal that the proposed approaches achieve BLEU scores comparable to the full system using a very small fraction of all available training data (ca. 6% for E-P and 13% for E-D). We further demonstrate that the ILP method supports global constraints of significant practical value

    An audio-based wakeword-independent verification system

    Full text link
    We propose an audio-based wakeword-independent verification model to determine whether a wakeword spotting model correctly woke and should respond or incorrectly woke and should not respond. Our model works on any wakeword-initiated audio, independent of the wakeword by operating only on the audio surrounding the wakeword, yielding a wakeword agnostic model. This model is based on two key assumptions: that audio surrounding the wakeword is informative to determine if the user intended to wake the device and that this audio is independent of the wakeword itself. We show experimentally that on wakewords not included in the training set, our model trained without examples or knowledge of the wakeword is able to achieve verification performance comparable to models trained on 5,000 to 10,000 annotated examples of the new wakeword.Published versio

    Co-clustering of image segments using convex optimization applied to EM neuronal reconstruction

    No full text
    This paper addresses the problem of jointly clustering two segmentations of closely correlated images. We fo-cus in particular on the application of reconstructing neu-ronal structures in over-segmented electron microscopy im-ages. We formulate the problem of co-clustering as a quadratic semi-assignment problem and investigate convex relaxations using semidefinite and linear programming. We further introduce a linear programming method with man-ageable number of constraints and present an approach for learning the cost function. Our method increases computa-tional efficiency by orders of magnitude while maintaining accuracy, automatically finds the optimal number of clus-ters, and empirically tends to produce binary assignment solutions. We illustrate our approach in simulations and in experiments with real EM data. 1
    corecore