41 research outputs found

    Why has (reasonably accurate) Automatic Speech Recognition been so hard to achieve?

    Full text link
    Hidden Markov models (HMMs) have been successfully applied to automatic speech recognition for more than 35 years in spite of the fact that a key HMM assumption -- the statistical independence of frames -- is obviously violated by speech data. In fact, this data/model mismatch has inspired many attempts to modify or replace HMMs with alternative models that are better able to take into account the statistical dependence of frames. However it is fair to say that in 2010 the HMM is the consensus model of choice for speech recognition and that HMMs are at the heart of both commercially available products and contemporary research systems. In this paper we present a preliminary exploration aimed at understanding how speech data depart from HMMs and what effect this departure has on the accuracy of HMM-based speech recognition. Our analysis uses standard diagnostic tools from the field of statistics -- hypothesis testing, simulation and resampling -- which are rarely used in the field of speech recognition. Our main result, obtained by novel manipulations of real and resampled data, demonstrates that real data have statistical dependency and that this dependency is responsible for significant numbers of recognition errors. We also demonstrate, using simulation and resampling, that if we `remove' the statistical dependency from data, then the resulting recognition error rates become negligible. Taken together, these results suggest that a better understanding of the structure of the statistical dependency in speech data is a crucial first step towards improving HMM-based speech recognition

    Inverse systems of spectra and generalizations of a theorem of W.H. Lin

    Get PDF
    In this thesis we generalize a theorem of W. H. Lin. Lin's results are concerned with the homotopy and cohomotopy of an inverse system of spectra {P-k }. Using the quadratic construction we construct an inverse system of spectra {P-k(E)} We generalize Lin's results by studying the homotopy and cohomotopy of {P-k(E)}

    THE BLAME GAME IN MEETING ROOM ASR: AN ANALYSIS OF FEATURE VERSUS MODEL ERRORS IN NOISY AND MISMATCHED CONDITIONS

    Get PDF
    ABSTRACT Given a test waveform, state-of-the-art ASR systems extract a sequence of MFCC features and decode them with a set of trained HMMs. When this test data is clean, and it matches the condition used for training the models, then there are few errors. While it is known that ASR systems are brittle in noisy or mismatched conditions, there has been little work in quantitatively attributing the errors to features or to models. This paper attributes the sources of these errors in three conditions: (a) matched near-field, (b) matched far-field, and a (c) mismatched condition. We undertake a series of diagnostic analyses employing the bootstrap method to probe a meeting room ASR system. Results show that when the conditions are matched (even if they are far-field), the model errors dominate; however, in mismatched conditions features are neither invariant nor separable and this causes as many errors as the model does

    Non-target species mortality and the measurement of brodifacoum rodenticide residues after a rat (Rattus rattus) eradication on Palmyra Atoll, tropical

    Get PDF
    a b s t r a c t The use of rodenticides to control or eradicate invasive rats (Rattus spp.) for conservation purposes has rapidly grown in the past decades, especially on islands. The non-target consequences and the fate of toxicant residue from such rodent eradication operations have not been well explored. In a cooperative effort, we monitored the application of a rodenticide, 'Brodifacoum 25W: Conservation', during an attempt to eradicate Rattus rattus from Palmyra Atoll. In 2011, Brodifacoum 25W: Conservation was aerially broadcasted twice over the entire atoll (2.5 km 2 ) at rates of 80 kg/ha and 75 kg/ha and a supplemental hand broadcast application (71.6 kg/ha) occurred three weeks after the second aerial application over a 10 ha area. We documented brodifacoum residues in soil, water, and biota, and documented mortality of non-target organisms. Some bait (14-19% of the target application rate) entered the marine environment to distances 7 m from the shore. After the application commenced, carcasses of 84 animals representing 15 species of birds, fish, reptiles and invertebrates were collected opportunistically as potential non-target mortalities. In addition, fish, reptiles, and invertebrates were systematically collected for residue analysis. Brodifacoum residues were detected in most (84.3%) of the animal samples analyzed. Although detection of residues in samples was anticipated, the extent and concentrations in many parts of the food web were greater than expected. Risk assessments should carefully consider application rates and entire food webs prior to operations using rodenticides. Published by Elsevier Ltd

    Serine-Rich Repeat Protein adhesins from Lactobacillus reuteri display strain specific glycosylation profiles

    Get PDF
    Lactobacillus reuteri is a gut symbiont inhabiting the gastrointestinal tract of numerous vertebrates. The surface-exposed Serine-Rich Repeat Protein (SRRP) is a major adhesin in Gram-positive bacteria. Using lectin and sugar nucleotide profiling of wild-type or L. reuteri isogenic mutants, MALDI-ToF-MS, LC-MS and GC-MS analyses of SRRPs, we showed that L. reuteri strains 100-23C (from rodent) and ATCC 53608 (from pig) can perform protein O-glycosylation and modify SRRP100-23 and SRRP53608 with Hex-Glc-GlcNAc and di-GlcNAc moieties, respectively. Furthermore, in vivo glycoengineering in E. coli led to glycosylation of SRRP53608 variants with α-GlcNAc and GlcNAcβ(1→6)GlcNAcα moieties. The glycosyltransferases involved in the modification of these adhesins were identified within the SecA2/Y2 accessory secretion system and their sugar nucleotide preference determined by saturation transfer difference NMR spectroscopy and differential scanning fluorimetry. Together, these findings provide novel insights into the cellular O-protein glycosylation pathways of gut commensal bacteria and potential routes for glycoengineering applications

    Using population admixture to help complete maps of the human genome

    Get PDF
    Tens of millions of base pairs of euchromatic human genome sequence, including many protein-coding genes, have no known location in the human genome. We describe an approach for localizing the human genome's missing pieces by utilizing the patterns of genome sequence variation created by population admixture. We mapped the locations of 70 scaffolds spanning four million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified eight large novel inter-chromosomal segmental duplications. We find that most of these sequences are hidden in the genome's heterochromatin, particularly its pericentromeric regions. Many cryptic, pericentromeric genes are expressed in RNA and have been maintained intact for millions of years while their expression patterns diverged from those of paralogous genes elsewhere in the genome. We describe how knowledge of the locations of these sequences can inform disease association and genome biology studies

    Author Correction: An analysis-ready and quality controlled resource for pediatric brain white-matter research

    Get PDF

    Dragon Systems' 1997 Broadcast News Transcription System

    No full text
    INTRODUCTION This system represents Dragon's first participation in the HUB4 evaluations since the 1995 Marketplace dry run. At that time, we used a fairly complicated system which had three sets of acoustic models: one for clean wide-bandwidth data, one for low-bandwidth data, and one for speech with music in the background. Our system produced small pieces that were labelled by channel type and then decoded with the appropriate model set [1]. Our 1997 evaluation system is much simpler, since we use one set of gender independent, speaker normalized models to recognize all of the data. In the two years between the dry run and the current evaluation, much of our development work focused on the Switchboard corpus [2], including many techniques -- such as speaker normalization and rapid adaptation -- which now make it possible to consolidate the treatment across channels and speakers. The current evaluation system in many ways represents the transfer of these new techniques int

    Dragon Systems' 1997 Mandarin Broadcast News System

    No full text
    INTRODUCTION The development of our 1997 HUB4 Mandarin system was an exercise in technology transfer. For this initial implementation, our strategy was to change the structure of our HUB4 English system only when absolutely necessary. In deciding what was "necessary", we tried to bear in mind that there are differences in the languages that have important implications for speech recognition. For example, Mandarin is a toned language and our front-end does not standardly compute pitch, one of the most important indicators of tone. Also, the notion of word is not well defined for Mandarin, which has implications for language model and even acoustic model development in our word-based recognition system. The Mandarin system we developed for the HUB4 evaluation is almost identical to our English system [1] except in the following respects: . We used only the data supplied by the LDC specifically for the Mandarin HUB4 evaluation. In contrast, the English
    corecore