Search CORE

76 research outputs found

Robust automatic transcription of lectures

Author: Wölfel Matthias
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2009
Field of study

Automatic transcription of lectures is becoming an important task. Possible applications can be found in the fields of automatic translation or summarization, information retrieval, digital libraries, education and communication research. Ideally those systems would operate on distant recordings, freeing the presenter from wearing body-mounted microphones. This task, however, is surpassingly difficult, given that the speech signal is severely degraded by background noise and reverberation

KITopen

Directory of Open Access Books (DOAB)

Robust Automatic Transcription of Lectures

Author: Wölfel Matthias
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

Die automatische Transkription von Vorträgen, Vorlesungen und Präsentationen wird immer wichtiger und ermöglicht erst die Anwendungen der automatischen Übersetzung von Sprache, der automatischen Zusammenfassung von Sprache, der gezielten Informationssuche in Audiodaten und somit die leichtere Zugänglichkeit in digitalen Bibliotheken. Im Idealfall arbeitet ein solches System mit einem Mikrofon das den Vortragenden vom Tragen eines Mikrofons befreit was der Fokus dieser Arbeit ist

KITopen

Computing the fundamental frequency variation spectrum in conversational spoken dialogue systems

Author: Jens Edlund
Kornel Laskowski
Matthias Wölfel
Mattias Heldner
Publication venue: 'Acoustical Society of America (ASA)'
Publication date
Field of study

Crossref

Creating Interactive Experiences together with People with Dementia – an Inclusive Design Story

Author: Bejan Alexander
Kienzler Ramona
Kunze Christophe
Wieland Markus
Wölfel Matthias
Publication venue
Publication date: 01/01/2017
Field of study

Hochschulschriftenserver der Hochschule Furtwangen

Minimum Mutual Information Beamforming for Simultaneous Active Speakers

Author: Gehrig Tobias
Kumatani Kenichi
Mayer Uwe
McDonough John
Stoimenov Emilian
Wölfel Matthias
Publication venue: IDIAP
Publication date: 11/02/2010
Field of study

In this work, we consider an acoustic beamforming application where two speakers are simultaneously active. We construct one subband-domain beamformer in \emph{generalized sidelobe canceller} (GSC) configuration for each source. In contrast to normal practice, we then jointly optimize the \emph{active weight vectors} of both GSCs to obtain two output signals with \emph{minimum mutual information} (MMI). Assuming that the subband snapshots are Gaussian-distributed, this MMI criterion reduces to the requirement that the \emph{cross-correlation coefficient} of the subband outputs of the two GSCs vanishes. We also compare separation performance under the Gaussian assumption with that obtained from several super-Gaussian probability density functions (pdfs), namely, the Laplace,

K_0

, and

\Gamma

pdfs. Our proposed technique provides effective nulling of the undesired source, but without the signal cancellation problems seen in conventional beamforming. Moreover, our technique does not suffer from the source permutation and scaling ambiguities encountered in conventional blind source separation algorithms. We demonstrate the effectiveness of our proposed technique through a series of far-field automatic speech recognition experiments on data from the \emph{PASCAL Speech Separation Challenge} (SSC). On the SSC development data, the simple delay-and-sum beamformer achieves a word error rate (WER) of 70.4\%. The MMI beamformer under a Gaussian assumption achieves a 55.2\% WER, which is further reduced to 52.0\% with a

K_0

pdf, whereas the WER for data recorded with a close-talking microphone is 21.6\%

Infoscience - École polytechnique fédérale de Lausanne

To separate speech! a system for recognizing simultaneous speech

Author: Dietrich Klakow
Emilian Stoimenov
John Mcdonough
Kenichi Kumatani
Matthias Wölfel
Stefan Schacht
Tobias Gehrig
Uwe Mayer
Publication venue
Publication date: 01/01/2007
Field of study

Abstract. The PASCAL Speech Separation Challenge (SSC) is based on a corpus of sentences from the Wall Street Journal task read by two speakers simultaneously and captured with two circular eight-channel microphone arrays. This work describes our system for the recognition of such simultaneous speech. Our system has four principal components: A person tracker returns the locations of both active speakers, as well as segmentation information for each utterance, which are often of unequal length; two beamformers in generalized sidelobe canceller (GSC) configuration separate the simultaneous speech by setting their active weight vectors according to a minimum mutual information (MMI) criterion; a postfilter and binary mask operating on the outputs of the beamformers further enhance the separated speech; and finally an automatic speech recognition (ASR) engine based on a weighted finite-state transducer (WFST) returns the most likely word hypotheses for the separated streams. In addition to optimizing each of these components, we investigated the effect of the filter bank design used to perform subband analysis and synthesis during beamforming. On the SSC development data, our system achieved a word error rate of 39.6%

CiteSeerX

Temporal and spatial analysis of the 2014-2015 Ebola virus outbreak in West Africa

Author: A Topfer
Adomeh Donatus
AJ Drummond
AJ Drummond
Alexandra Fizet
Alexis Traore
Amadou Bah
Andreas Kurth
Andreas Nitsche
Andreas Sachse
Andrew Bosworth
Andrew Rambaut
Angela Cannas
Anja Lüdtke
Anne Bocquin
Anne Kelterbaum
Annette Kraus
Antje Hermelink
Antonino Di Caro
Armand Sprecher
B Langmead
Babak Afrough
Barry Atkinson
Beate Becker-Ziaja
Benjamin Meyer
Benny Borremans
Bernadett Pályi
Birte Kretschmer
Boubacar Diallo
Britta Liedigk
Callum Wright
Catherine Pratt
CF Basler
Christopher H. Logue
Claudia Kohl
Concetta Castilletti
Constanze Yue
Cordelia E. M. Coltart
César Muñoz-Fontela
D Gatherer
Damien Steer
Danny Asogun
David A. Matthews
Deborah Ehichioya
Didier Ngabo
Dirk Becker
Doreen Muth
Edmund N. C. Newman
Eeva Kuisma
Ekaete Tobin
Elisa Pallasch
Emmanuel Omomoh
Erna Fleischmann
F Ronquist
Fabrizio Carletti
Francesca Colavita
Francis Senyah
Georgios Pollakis
Giuseppe Ippolito
Gordian Schudt
Gytis Dudas
Heinz Ellerbrok
Hervé Raoul
Hilde de Clerck
Howard Tolley
I García-Dorival
Inês Vitoriano
Isabel García-Dorival
J Goecks
James McCowen
Jan Baumann
Jan Peter Boettcher
Janine Michel
Jasmine Portmann
Jennifer Okosun
Jochen Trautner
Johanna Repits
John Aiyepada
John G. Kenny
Joseph Akoi Bore
JS Schieffelin
Julia Hinzmann
Julian A. Hiscox
Julie Rappe
JW Wynne
Kilian Stoecker
Kristina Maria Schmidt
Lamine Koivogui
Lisa Jameson
Lisa Oestereich
M Mateo
Mandiou Diakite
Mandy Kader Konde
Marc Mertens
Marc Strasser
Maria Dolores Fernandez-Garcia
Maria Rosaria Capobianchi
Marlis Badusche
Martin Gabriel
Martin Richter
Martin Rudolf
Matthias Wagner
Michael J. Elmore
Michel Van Herp
Miles W. Carroll
Miša Korva
MS Gill
Natasha Y. Rickett
N’Faly Magassouba
Patience Akhilomen
Patrick Drury
Peter Molkenthin
Pierre Formenty
Piet Maes
Racheal Omiunu
Raymond Koundouno
Roger Hewson
Roman Wölfel
Romy Kerber
Ruth Thom
S Baize
S Tavaré
Sakoba Keita
Sandra Diederich
Saïd Abdellati
SD Dowall
Serena Quartu
Silvia Meschi
Simon Bate
Simon Clark
Simone Priesnitz
SK Gire
Sophie Duraffour
Sophie Gryseels
Stefan Kloth
Stephan Becker
Stephan Günther
Stephen Thomas
Stéphane Mély
Svenja Wolff
SY Ho
Tatjana Avšič-Županc
Thomas Olokor
Thomas Pottage
Thomas Strecker
Tine Vermoesen
Ute Hopf-Guevara
Yemisi Ighodalo
Yper Hall
Z Yang
Zoltan Kis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

West Africa is currently witnessing the most extensive Ebola virus (EBOV) outbreak so far recorded. Until now, there have been 27,013 reported cases and 11,134 deaths. The origin of the virus is thought to have been a zoonotic transmission from a bat to a two-year-old boy in December 2013 (ref. 2). From this index case the virus was spread by human-to-human contact throughout Guinea, Sierra Leone and Liberia. However, the origin of the particular virus in each country and time of transmission is not known and currently relies on epidemiological analysis, which may be unreliable owing to the difficulties of obtaining patient information. Here we trace the genetic evolution of EBOV in the current outbreak that has resulted in multiple lineages. Deep sequencing of 179 patient samples processed by the European Mobile Laboratory, the first diagnostics unit to be deployed to the epicentre of the outbreak in Guinea, reveals an epidemiological and evolutionary history of the epidemic from March 2014 to January 2015. Analysis of EBOV genome evolution has also benefited from a similar sequencing effort of patient samples from Sierra Leone. Our results confirm that the EBOV from Guinea moved into Sierra Leone, most likely in April or early May. The viruses of the Guinea/Sierra Leone lineage mixed around June/July 2014. Viral sequences covering August, September and October 2014 indicate that this lineage evolved independently within Guinea. These data can be used in conjunction with epidemiological information to test retrospectively the effectiveness of control measures, and provides an unprecedented window into the evolution of an ongoing viral haemorrhagic fever outbreak.status: publishe

Lirias

Crossref

Edinburgh Research Explorer

Institutional Repository Universiteit Antwerpen

Explore Bristol Research

Integration of the predicted walk model estimate into the particle filter framework

Author: Matthias Wölfel
Publication venue
Publication date: 01/01/2008
Field of study

Distortion robustness is one of the most significant problems in automatic speech recognition. While a lot of research in speech feature enhancement in automatic recognition has focused on stationary distortions, most of the observed distortions are non-stationary. To cope with the non-stationary behavior, just recently, various particle filter approaches have been proposed to track the non-stationary distortions on speech features in logarithmic spectral or cepstral domain. Most of those techniques rely on the prediction of the noise evolution model by a linear prediction matrix. The current estimation of the linear prediction matrix, however, needs noise only observations which have to be either given a priori or to be detected by voice activity detection. This makes it impossible to adapt the linear prediction matrix to the dynamics of the noise on speech regions. In this publication we propose to estimate or update the linear prediction matrix directly on the noisy speech observations. This is possible within the particle filter framework by weighting the different noisy estimates (particles) due to their likelihood in the estimation equation of the linear prediction matrix. Speech recognition experiments on actual recordings with different speaker to microphone distances confirm the soundness of the proposed approach. Index Terms — speech feature enhancement, particle filter, predicted walk, linear prediction matrix, automatic speech recognition 1

CiteSeerX

Crossref

Robust automatic transcription of lectures

Author: Wölfel Matthias
Publication venue: KIT Scientific Publishing
Publication date: 30/07/2019
Field of study

Directory of Open Access Books (DOAB)