Search CORE

21 research outputs found

Implicitly Supervised Language Model Adaptation for Meeting Transcription

Author: Alexander I. Rudnicky
David Huggins-daines
Publication venue
Publication date: 01/01/2007
Field of study

We describe the use of meeting metadata, acquired using a computerized meeting organization and note-taking system, to improve automatic transcription of meetings. By applying a two-step language model adaptation process based on notes and agenda items, we were able to reduce perplexity by 9 % and word error rate by 4 % relative on a set of ten meetings recorded in-house. This approach can be used to leverage other types of metadata.

CiteSeerX

Crossref

Audio-Motor Integration for Robot Audition

Author: Aharon
Argentieri
Aytekin
Barfuss
Berglund
Bernard
Bernard
Braasch
Bustamante
Cooke
Davis
Deleforge
Deleforge
Deleforge
Deleforge
Deleforge
Deleforge
Deleforge
Evers
Furukawa
Gannot
Gaultier
Gouaillier
Haykin
Hofman
Hofman
Hornstein
Huang
Huggins-Daines
Ince
Ito
Kato
Kneip
Kreković
Li
Löllmann
Löllmann
Ma
Ma
Magassouba
May
Middlebrooks
Nakadai
Nakadai
Nakadai
Nakadai
Naylor
Nguyen
O'Regan
Otani
Otsuka
Ozerov
Perrett
Poincaré
Portello
Prasad
Rascon
Sanchez-Riera
Sawada
Schmidt
Schmidt
Schölkopf
Smaragdis
Strutt (Lord Rayleigh)
Talmon
Thurlow
Tourbabin
Tropp
Valin
Vincent
Vincent
Virtanen
Wallach
Wang
Wang
Wightman
Wright
Xiao
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 19/11/2018
Field of study

International audienceIn the context of robotics, audio signal processing in the wild amounts to dealing with sounds recorded by a system that moves and whose actuators produce noise. This creates additional challenges in sound source localization, signal enhancement and recognition. But the speci-ficity of such platforms also brings interesting opportunities: can information about the robot actuators' states be meaningfully integrated in the audio processing pipeline to improve performance and efficiency? While robot audition grew to become an established field, methods that explicitly use motor-state information as a complementary modality to audio are scarcer. This chapter proposes a unified view of this endeavour, referred to as audio-motor integration. A literature review and two learning-based methods for audio-motor integration in robot audition are presented, with application to single-microphone sound source localization and ego-noise reduction on real data

Crossref

INRIA a CCSD electronic archive server

Interactive ASR error correction for touchscreen devices

Author: Alexander I. Rudnicky
David Huggins-daines
Publication venue
Publication date: 01/01/2008
Field of study

We will demonstrate a novel graphical interface for correcting search errors in the output of a speech recognizer. This interface allows the user to visualize the word lattice by “pulling apart ” regions of the hypothesis to reveal a cloud of words simlar to the “tag clouds ” popular in many Web applications. This interface is potentially useful for dictation on portable touchscreen devices such as the Nokia N800 and other mobile Internet devices.

CiteSeerX

Crossref

Interactive ASR Error Correction for Touchscreen Devices

Author: Alexander Rudnicky (4321747)
David Huggins-Daines (5409281)
Publication venue
Publication date: 30/06/2018
Field of study

We will demonstrate a novel graphical interface for correcting search errors in the output of a speech recognizer. This interface allows the user to visualize the word lattice by “pulling apart” regions of the hypothesis to reveal a cloud of words simlar to the “tag clouds” popular in many Web applications. This interface is potentially useful for dictation on portable touchscreen devices such as the Nokia N800 and other mobile Internet devices. </p

Mixture Pruning and Roughening for Scalable Acoustic Models

Author: Alexander Rudnicky (4321747)
David Huggins-Daines (5409281)
Publication venue
Publication date: 30/06/2018
Field of study

In an automatic speech recognition system using a tied-mixture acoustic model, the main cost in CPU time and memory lies not in the evaluation and storage of Gaussians themselves but rather in evaluating the mixture likelihoods for each state output distribution. Using a simple entropy-based technique for pruning the mixture weight distributions, we can achieve a signiﬁcant speedup in recognition for a 5000-word vocabulary with a negligible increase in word error rate. This allows us to achieve real-time connected-word dictation on an ARM-based mobile device. </p

A Constrained Baum-Welch Algorithm for Improved Phoneme Segmentation and Efﬁcient Training

Author: Alexander Rudnicky (4321747)
David Huggins-Daines (5409281)
Publication venue
Publication date: 30/06/2018
Field of study

We describe an extension to the Baum-Welch algorithm for training Hidden Markov Models that uses explicit phoneme segmentation to constrain the forward and backward lattice. The HMMs trained with this algorithm can be shown to improve the accuracy of automatic phoneme segmentation. In addition, this algorithm is signiﬁcantly more computationally efﬁcient than the full BaumWelch algorithm, while producing models that achieve equivalent accuracy on a standard phoneme recognition task.</p

Interactive mobile robot in a dynamic environment

Author: Ahonen
Aulinas
Breazeal
Dellaert
Durrant-Whyte
Grisettiyz
Hornung
Huggins-Daines
Quigley
Viola
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Crossref