Search CORE

7,856 research outputs found

Hierarchical multi-stream posterior based speech secognition system

Author: H. Bourlard
H. Hermansky
L. Mangu
L.R. Rabiner
S. Dupont
Publication venue
Publication date: 01/01/2006
Field of study

Abstract. In this paper, we present initial results towards boosting posterior based speech recognition systems by estimating more informative posteriors using multiple streams of features and taking into account acoustic context (e.g., as available in the whole utterance), as well as possible prior information (such as topological constraints). These posteriors are estimated based on “state gamma posterior ” definition (typically used in standard HMMs training) extended to the case of multi-stream HMMs.This approach provides a new, principled, theoretical framework for hierarchical estimation/use of posteriors, multi-stream feature combination, and integrating appropriate context and prior knowledge in posterior estimates. In the present work, we used the resulting gamma posteriors as features for a standard HMM/GMM layer. On the OGI Digits database and on a reduced vocabulary version (1000 words) of the DARPA Conversational Telephone Speech-to-text (CTS) task, this resulted in significant performance improvement, compared to the stateof-the-art Tandem systems.

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

Learning Generative Models with Visual Attention

Author: Salakhutdinov Ruslan
Srivastava Nitish
Tang Yichuan
Publication venue
Publication date: 21/02/2015
Field of study

Attention has long been proposed by psychologists as important for effectively dealing with the enormous sensory stimulus available in the neocortex. Inspired by the visual attention models in computational neuroscience and the need of object-centric data for generative models, we describe for generative learning framework using attentional mechanisms. Attentional mechanisms can propagate signals from region of interest in a scene to an aligned canonical representation, where generative modeling takes place. By ignoring background clutter, generative models can concentrate their resources on the object of interest. Our model is a proper graphical model where the 2D Similarity transformation is a part of the top-down process. A ConvNet is employed to provide good initializations during posterior inference which is based on Hamiltonian Monte Carlo. Upon learning images of faces, our model can robustly attend to face regions of novel test subjects. More importantly, our model can learn generative models of new faces from a novel dataset of large images where the face locations are not known.Comment: In the proceedings of Neural Information Processing Systems, 201

arXiv.org e-Print Archive

CiteSeerX

Personalizing gesture recognition using hierarchical bayesian neural networks

Author: Betke Margrit
Ghosh Soumya
Joshi Ajjen
Pfister Hanspeter
Sclaroff Stan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Building robust classifiers trained on data susceptible to group or subject-specific variations is a challenging pattern recognition problem. We develop hierarchical Bayesian neural networks to capture subject-specific variations and share statistical strength across subjects. Leveraging recent work on learning Bayesian neural networks, we build fast, scalable algorithms for inferring the posterior distribution over all network weights in the hierarchy. We also develop methods for adapting our model to new subjects when a small number of subject-specific personalization data is available. Finally, we investigate active learning algorithms for interactively labeling personalization data in resource-constrained scenarios. Focusing on the problem of gesture recognition where inter-subject variations are commonplace, we demonstrate the effectiveness of our proposed techniques. We test our framework on three widely used gesture recognition datasets, achieving personalization performance competitive with the state-of-the-art.http://openaccess.thecvf.com/content_cvpr_2017/html/Joshi_Personalizing_Gesture_Recognition_CVPR_2017_paper.htmlhttp://openaccess.thecvf.com/content_cvpr_2017/html/Joshi_Personalizing_Gesture_Recognition_CVPR_2017_paper.htmlhttp://openaccess.thecvf.com/content_cvpr_2017/html/Joshi_Personalizing_Gesture_Recognition_CVPR_2017_paper.htmlPublished versio

Crossref

Boston University Institutional Repository (OpenBU)

Human Verbal Memory Encoding Is Hierarchically Distributed in a Continuous Processing Stream.

Author: Berry Brent M.
Davis Kathryn A.
Gorniak Richard
Inman Cory S.
Iyer Ravishankar K.
Jobst Barbara C.
Kahana Michael J.
Khadjevand Fatemeh
Kremen Vaclav
Kucewicz Michal T.
Lega Bradley
Miller Laura R.
Rizzuto Daniel S.
Saboo Krishnakant
Sheth Sameer A.
Sperling Michael R.
Wanda Paul
Worrell Gregory A.
Publication venue: Jefferson Digital Commons
Publication date: 04/03/2019
Field of study

Processing of memory is supported by coordinated activity in a network of sensory, association, and motor brain regions. It remains a major challenge to determine where memory is encoded for later retrieval. Here, we used direct intracranial brain recordings from epilepsy patients performing free recall tasks to determine the temporal pattern and anatomical distribution of verbal memory encoding across the entire human cortex. High γ frequency activity (65-115 Hz) showed consistent power responses during encoding of subsequently recalled and forgotten words on a subset of electrodes localized in 16 distinct cortical areas activated in the tasks. More of the high γ power during word encoding, and less power before and after the word presentation, was characteristic of successful recall and observed across multiple brain regions. Latencies of the induced power changes and this subsequent memory effect (SME) between the recalled and forgotten words followed an anatomical sequence from visual to prefrontal cortical areas. Finally, the magnitude of the memory effect was unexpectedly found to be the largest in selected brain regions both at the top and at the bottom of the processing stream. These included the language processing areas of the prefrontal cortex and the early visual areas at the junction of the occipital and temporal lobes. Our results provide evidence for distributed encoding of verbal memory organized along a hierarchical posterior-to-anterior processing stream

Jefferson Digital Commons

Processing and Linking Audio Events in Large Multimedia Archives: The EU inEvent Project

Author: Bell P.
Bourlard H.
Ferras M.
Guillemot M.
Ingram S.
McInnes F.
Pappas N.
Popescu-Belis A.
Renals S.
Publication venue
Publication date: 01/08/2013
Field of study

In the inEvent EU project [1], we aim at structuring, retrieving, and sharing large archives of networked, and dynamically changing, multimedia recordings, mainly consisting of meetings, videoconferences, and lectures. More specifically, we are developing an integrated system that performs audiovisual processing of multimedia recordings, and labels them in terms of interconnected “hyper-events ” (a notion inspired from hyper-texts). Each hyper-event is composed of simpler facets, including audio-video recordings and metadata, which are then easier to search, retrieve and share. In the present paper, we mainly cover the audio processing aspects of the system, including speech recognition, speaker diarization and linking (across recordings), the use of these features for hyper-event indexing and recommendation, and the search portal. We present initial results for feature extraction from lecture recordings using the TED talks. Index Terms: Networked multimedia events; audio processing: speech recognition; speaker diarization and linking; multimedia indexing and searching; hyper-events. 1

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Edinburgh Research Explorer