23 research outputs found
Human Movement Analysis: Ballistic Dynamics, and Edge Continuity for Pose Estimation
We present two contributions to human movement analysis: (a) a ballistic dynamical model for recognizing movements, and (b) a model for coupling edge continuity with contour matching.
We describe a Bayesian approach for visual analysis of ballistic hand movements, namely reaches and strikes. These movements are most commonly used for interacting with objects and the environment. One of the key challenges to recognizing them is the variability of the target-location of the hand~- people can reach above their heads, for something on the floor, etc. Our approach recognizes them independent of the movement's target-location and direction by modelling the ballistic dynamics. A video sequence is automatically segmented into ballistic subsequences without tracking the hands. The segments are then classified into strike and reach movements based on low-level motion features. Each ballistic segment is further analyzed to compute qualitative labels for the movement's target-location and direction. Tests are presented with a set of reach and strike movement sequences.
We present an approach for whole-body pose contour matching. Contour matching in natural images in the absence of foreground-background segmentation is difficult. Usually an asymmetric approach is adopted, where a contour is said to match well if it aligns with a subset of the image's gradients. This leads to problems as the contour can match with a portion of an object's outline and ignore the remainder. We present a model for using edge-continuity to address this issue. Pairs of edge elements in the image are linked with affinities if they are likely to belong to the same object. A contour that matches with a set of image gradients is constrained to also match with other gradients having high affinities with the chosen ones. A Markov Random Field framework is employed to couple edge continuity and contour matching into a joint optimization process. The approach is illustrated with applications to pose estimation and human detection
Accurate Detection of Wake Word Start and End Using a CNN
Small footprint embedded devices require keyword spotters (KWS) with small
model size and detection latency for enabling voice assistants. Such a keyword
is often referred to as \textit{wake word} as it is used to wake up voice
assistant enabled devices. Together with wake word detection, accurate
estimation of wake word endpoints (start and end) is an important task of KWS.
In this paper, we propose two new methods for detecting the endpoints of wake
words in neural KWS that use single-stage word-level neural networks. Our
results show that the new techniques give superior accuracy for detecting wake
words' endpoints of up to 50 msec standard error versus human annotations, on
par with the conventional Acoustic Model plus HMM forced alignment. To our
knowledge, this is the first study of wake word endpoints detection methods for
single-stage neural KWS.Comment: Proceedings of INTERSPEEC
Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting
We propose a max-pooling based loss function for training Long Short-Term
Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low
CPU, memory, and latency requirements. The max-pooling loss training can be
further guided by initializing with a cross-entropy loss trained network. A
posterior smoothing based evaluation approach is employed to measure keyword
spotting performance. Our experimental results show that LSTM models trained
using cross-entropy loss or max-pooling loss outperform a cross-entropy loss
trained baseline feed-forward Deep Neural Network (DNN). In addition,
max-pooling loss trained LSTM with randomly initialized network performs better
compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss
trained LSTM initialized with a cross-entropy pre-trained network shows the
best performance, which yields relative reduction compared to baseline
feed-forward DNN in Area Under the Curve (AUC) measure
Source Error-Projection for Sample Selection in Phrase-Based SMT for Resource-Poor Languages
Abstract The unavailability of parallel training corpora in resource-poor languages is a major bottleneck in cost-effective and rapid deployment of statistical machine translation (SMT) technology. This has spurred significant interest in active learning for SMT to select the most informative samples from a large candidate pool. This is especially challenging when irrelevant outliers dominate the pool. We propose two supervised sample selection methods, viz. greedy selection and integer linear programming (ILP), based on a novel measure of benefit derived from error analysis. These methods support the selection of diverse and high-impact, yet relevant batches of source sentences. Comparative experiments on multiple test sets across two resource-poor language pairs (English-Pashto and English-Dari) reveal that the proposed approaches achieve BLEU scores comparable to the full system using a very small fraction of all available training data (ca. 6% for E-P and 13% for E-D). We further demonstrate that the ILP method supports global constraints of significant practical value
An audio-based wakeword-independent verification system
We propose an audio-based wakeword-independent verification
model to determine whether a wakeword spotting model correctly
woke and should respond or incorrectly woke and should
not respond. Our model works on any wakeword-initiated audio,
independent of the wakeword by operating only on the audio
surrounding the wakeword, yielding a wakeword agnostic
model. This model is based on two key assumptions: that audio
surrounding the wakeword is informative to determine if the
user intended to wake the device and that this audio is independent
of the wakeword itself. We show experimentally that on
wakewords not included in the training set, our model trained
without examples or knowledge of the wakeword is able to
achieve verification performance comparable to models trained
on 5,000 to 10,000 annotated examples of the new wakeword.Published versio
Metadata-aware end-to-end keyword spotting
Published versio
Co-clustering of image segments using convex optimization applied to EM neuronal reconstruction
This paper addresses the problem of jointly clustering two segmentations of closely correlated images. We fo-cus in particular on the application of reconstructing neu-ronal structures in over-segmented electron microscopy im-ages. We formulate the problem of co-clustering as a quadratic semi-assignment problem and investigate convex relaxations using semidefinite and linear programming. We further introduce a linear programming method with man-ageable number of constraints and present an approach for learning the cost function. Our method increases computa-tional efficiency by orders of magnitude while maintaining accuracy, automatically finds the optimal number of clus-ters, and empirically tends to produce binary assignment solutions. We illustrate our approach in simulations and in experiments with real EM data. 1