273 research outputs found
Lipreading with Long Short-Term Memory
Lipreading, i.e. speech recognition from visual-only recordings of a
speaker's face, can be achieved with a processing pipeline based solely on
neural networks, yielding significantly better accuracy than conventional
methods. Feed-forward and recurrent neural network layers (namely Long
Short-Term Memory; LSTM) are stacked to form a single structure which is
trained by back-propagating error gradients through all the layers. The
performance of such a stacked network was experimentally evaluated and compared
to a standard Support Vector Machine classifier using conventional computer
vision features (Eigenlips and Histograms of Oriented Gradients). The
evaluation was performed on data from 19 speakers of the publicly available
GRID corpus. With 51 different words to classify, we report a best word
accuracy on held-out evaluation speakers of 79.6% using the end-to-end neural
network-based solution (11.6% improvement over the best feature-based solution
evaluated).Comment: Accepted for publication at ICASSP 201
Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling
Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness
Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling
Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness
A Low-Dimensional Representation for Robust Partial Isometric Correspondences Computation
Intrinsic isometric shape matching has become the standard approach for pose
invariant correspondence estimation among deformable shapes. Most existing
approaches assume global consistency, i.e., the metric structure of the whole
manifold must not change significantly. While global isometric matching is well
understood, only a few heuristic solutions are known for partial matching.
Partial matching is particularly important for robustness to topological noise
(incomplete data and contacts), which is a common problem in real-world 3D
scanner data. In this paper, we introduce a new approach to partial, intrinsic
isometric matching. Our method is based on the observation that isometries are
fully determined by purely local information: a map of a single point and its
tangent space fixes an isometry for both global and the partial maps. From this
idea, we develop a new representation for partial isometric maps based on
equivalence classes of correspondences between pairs of points and their
tangent spaces. From this, we derive a local propagation algorithm that find
such mappings efficiently. In contrast to previous heuristics based on RANSAC
or expectation maximization, our method is based on a simple and sound
theoretical model and fully deterministic. We apply our approach to register
partial point clouds and compare it to the state-of-the-art methods, where we
obtain significant improvements over global methods for real-world data and
stronger guarantees than previous heuristic partial matching algorithms.Comment: 17 pages, 12 figure
3D modeling of indoor environments by a mobile platform with a laser scanner and panoramic camera
One major challenge of 3DTV is content acquisition. Here, we present a method to acquire a realistic, visually convincing D model of indoor environments based on a mobile platform that is equipped with a laser range scanner and a panoramic camera. The data of the 2D laser scans are used to solve the simultaneous lo- calization and mapping problem and to extract walls. Textures for walls and floor are built from the images of a calibrated panoramic camera. Multiresolution blending is used to hide seams in the gen- erated textures. The scene is further enriched by 3D-geometry cal- culated from a graph cut stereo technique. We present experimental results from a moderately large real environment.
- …