76 research outputs found

    Robust automatic transcription of lectures

    Get PDF
    Automatic transcription of lectures is becoming an important task. Possible applications can be found in the fields of automatic translation or summarization, information retrieval, digital libraries, education and communication research. Ideally those systems would operate on distant recordings, freeing the presenter from wearing body-mounted microphones. This task, however, is surpassingly difficult, given that the speech signal is severely degraded by background noise and reverberation

    Robust Automatic Transcription of Lectures

    Get PDF
    Die automatische Transkription von Vorträgen, Vorlesungen und Präsentationen wird immer wichtiger und ermöglicht erst die Anwendungen der automatischen Übersetzung von Sprache, der automatischen Zusammenfassung von Sprache, der gezielten Informationssuche in Audiodaten und somit die leichtere Zugänglichkeit in digitalen Bibliotheken. Im Idealfall arbeitet ein solches System mit einem Mikrofon das den Vortragenden vom Tragen eines Mikrofons befreit was der Fokus dieser Arbeit ist

    Minimum Mutual Information Beamforming for Simultaneous Active Speakers

    Get PDF
    In this work, we consider an acoustic beamforming application where two speakers are simultaneously active. We construct one subband-domain beamformer in \emph{generalized sidelobe canceller} (GSC) configuration for each source. In contrast to normal practice, we then jointly optimize the \emph{active weight vectors} of both GSCs to obtain two output signals with \emph{minimum mutual information} (MMI). Assuming that the subband snapshots are Gaussian-distributed, this MMI criterion reduces to the requirement that the \emph{cross-correlation coefficient} of the subband outputs of the two GSCs vanishes. We also compare separation performance under the Gaussian assumption with that obtained from several super-Gaussian probability density functions (pdfs), namely, the Laplace, K0K_0, and Γ\Gamma pdfs. Our proposed technique provides effective nulling of the undesired source, but without the signal cancellation problems seen in conventional beamforming. Moreover, our technique does not suffer from the source permutation and scaling ambiguities encountered in conventional blind source separation algorithms. We demonstrate the effectiveness of our proposed technique through a series of far-field automatic speech recognition experiments on data from the \emph{PASCAL Speech Separation Challenge} (SSC). On the SSC development data, the simple delay-and-sum beamformer achieves a word error rate (WER) of 70.4\%. The MMI beamformer under a Gaussian assumption achieves a 55.2\% WER, which is further reduced to 52.0\% with a K0K_0 pdf, whereas the WER for data recorded with a close-talking microphone is 21.6\%

    To separate speech! a system for recognizing simultaneous speech

    Get PDF
    Abstract. The PASCAL Speech Separation Challenge (SSC) is based on a corpus of sentences from the Wall Street Journal task read by two speakers simultaneously and captured with two circular eight-channel microphone arrays. This work describes our system for the recognition of such simultaneous speech. Our system has four principal components: A person tracker returns the locations of both active speakers, as well as segmentation information for each utterance, which are often of unequal length; two beamformers in generalized sidelobe canceller (GSC) configuration separate the simultaneous speech by setting their active weight vectors according to a minimum mutual information (MMI) criterion; a postfilter and binary mask operating on the outputs of the beamformers further enhance the separated speech; and finally an automatic speech recognition (ASR) engine based on a weighted finite-state transducer (WFST) returns the most likely word hypotheses for the separated streams. In addition to optimizing each of these components, we investigated the effect of the filter bank design used to perform subband analysis and synthesis during beamforming. On the SSC development data, our system achieved a word error rate of 39.6%

    Temporal and spatial analysis of the 2014-2015 Ebola virus outbreak in West Africa

    Get PDF
    West Africa is currently witnessing the most extensive Ebola virus (EBOV) outbreak so far recorded. Until now, there have been 27,013 reported cases and 11,134 deaths. The origin of the virus is thought to have been a zoonotic transmission from a bat to a two-year-old boy in December 2013 (ref. 2). From this index case the virus was spread by human-to-human contact throughout Guinea, Sierra Leone and Liberia. However, the origin of the particular virus in each country and time of transmission is not known and currently relies on epidemiological analysis, which may be unreliable owing to the difficulties of obtaining patient information. Here we trace the genetic evolution of EBOV in the current outbreak that has resulted in multiple lineages. Deep sequencing of 179 patient samples processed by the European Mobile Laboratory, the first diagnostics unit to be deployed to the epicentre of the outbreak in Guinea, reveals an epidemiological and evolutionary history of the epidemic from March 2014 to January 2015. Analysis of EBOV genome evolution has also benefited from a similar sequencing effort of patient samples from Sierra Leone. Our results confirm that the EBOV from Guinea moved into Sierra Leone, most likely in April or early May. The viruses of the Guinea/Sierra Leone lineage mixed around June/July 2014. Viral sequences covering August, September and October 2014 indicate that this lineage evolved independently within Guinea. These data can be used in conjunction with epidemiological information to test retrospectively the effectiveness of control measures, and provides an unprecedented window into the evolution of an ongoing viral haemorrhagic fever outbreak.status: publishe

    Integration of the predicted walk model estimate into the particle filter framework

    No full text
    Distortion robustness is one of the most significant problems in automatic speech recognition. While a lot of research in speech feature enhancement in automatic recognition has focused on stationary distortions, most of the observed distortions are non-stationary. To cope with the non-stationary behavior, just recently, various particle filter approaches have been proposed to track the non-stationary distortions on speech features in logarithmic spectral or cepstral domain. Most of those techniques rely on the prediction of the noise evolution model by a linear prediction matrix. The current estimation of the linear prediction matrix, however, needs noise only observations which have to be either given a priori or to be detected by voice activity detection. This makes it impossible to adapt the linear prediction matrix to the dynamics of the noise on speech regions. In this publication we propose to estimate or update the linear prediction matrix directly on the noisy speech observations. This is possible within the particle filter framework by weighting the different noisy estimates (particles) due to their likelihood in the estimation equation of the linear prediction matrix. Speech recognition experiments on actual recordings with different speaker to microphone distances confirm the soundness of the proposed approach. Index Terms — speech feature enhancement, particle filter, predicted walk, linear prediction matrix, automatic speech recognition 1

    Robust automatic transcription of lectures

    No full text
    Automatic transcription of lectures is becoming an important task. Possible applications can be found in the fields of automatic translation or summarization, information retrieval, digital libraries, education and communication research. Ideally those systems would operate on distant recordings, freeing the presenter from wearing body-mounted microphones. This task, however, is surpassingly difficult, given that the speech signal is severely degraded by background noise and reverberation
    corecore