543 research outputs found

    Comparing Human and Machine Errors in Conversational Speech Transcription

    Full text link
    Recent work in automatic recognition of conversational telephone speech (CTS) has achieved accuracy levels comparable to human transcribers, although there is some debate how to precisely quantify human performance on this task, using the NIST 2000 CTS evaluation set. This raises the question what systematic differences, if any, may be found differentiating human from machine transcription errors. In this paper we approach this question by comparing the output of our most accurate CTS recognition system to that of a standard speech transcription vendor pipeline. We find that the most frequent substitution, deletion and insertion error types of both outputs show a high degree of overlap. The only notable exception is that the automatic recognizer tends to confuse filled pauses ("uh") and backchannel acknowledgments ("uhhuh"). Humans tend not to make this error, presumably due to the distinctive and opposing pragmatic functions attached to these words. Furthermore, we quantify the correlation between human and machine errors at the speaker level, and investigate the effect of speaker overlap between training and test data. Finally, we report on an informal "Turing test" asking humans to discriminate between automatic and human transcription error cases

    Advances in All-Neural Speech Recognition

    Full text link
    This paper advances the design of CTC-based all-neural (or end-to-end) speech recognizers. We propose a novel symbol inventory, and a novel iterated-CTC method in which a second system is used to transform a noisy initial output into a cleaner version. We present a number of stabilization and initialization methods we have found useful in training these networks. We evaluate our system on the commonly used NIST 2000 conversational telephony test set, and significantly exceed the previously published performance of similar systems, both with and without the use of an external language model and decoding technology

    Acoustic-To-Word Model Without OOV

    Full text link
    Recently, the acoustic-to-word model based on the Connectionist Temporal Classification (CTC) criterion was shown as a natural end-to-end model directly targeting words as output units. However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node. Therefore, such word-based CTC model can only recognize the frequent words modeled by the network output nodes. It also cannot easily handle the hot-words which emerge after the model is trained. In this study, we improve the acoustic-to-word model with a hybrid CTC model which can predict both words and characters at the same time. With a shared-hidden-layer structure and modular design, the alignments of words generated from the word-based CTC and the character-based CTC are synchronized. Whenever the acoustic-to-word model emits an OOV token, we back off that OOV segment to the word output generated from the character-based CTC, hence solving the OOV or hot-words issue. Evaluated on a Microsoft Cortana voice assistant task, the proposed model can reduce the errors introduced by the OOV output token in the acoustic-to-word model by 30%

    Composite suspended sediment particles and flocculation in glacial meltwaters: preliminary evidence from Alpine and Himalayan basins

    Get PDF
    Research over the last decade has shown that the suspended sediment loads of many rivers are dominated by composite particles. These particles are also known as aggregates or flocs, and are commonly made up of constituent mineral particles, which evidence a wide range of grain sizes, and organic matter. The resulting in situ or effective particle size characteristics of fluvial suspended sediment exert a major control on all processes of entrainment, transport and deposition. The significance of composite suspended sediment particles in glacial meltwater streams has, however, not been established. Existing data on the particle size characteristics of suspended sediment in glacial meltwaters relate to the dispersed mineral fraction (absolute particle size), which, for certain size fractions, may bear little relationship to the effective or in situ distribution. Existing understanding of composite particle formation within freshwater environments would suggest that in-stream flocculation processes do not take place in glacial meltwater systems because of the absence of organic binding agents. However, we report preliminary scanning electron microscopy data for one Alpine and two Himalayan glaciers that show composite particles are present in the suspended sediment load of the meltwater system. The genesis and structure of these composite particles and their constituent grain size characteristics are discussed. We present evidence for the existence of both aggregates, or composite particles whose features are largely inherited from source materials, and flocs, which represent composite particles produced by instream flocculation processes. In the absence of organic materials, the latter may result solely from electrochemical flocculation in the meltwater sediment system. This type of floc formation has not been reported previously in the freshwater fluvial environment. Further work is needed to test the wider significance of these data and to investigate the effective particle size characteristics of suspended sediment associated with high concentration outburst events. Such events make a major contribution to suspended sediment fluxes in meltwater streams and may provide conditions that are conducive to composite particle formation by flocculation

    The Microsoft 2016 Conversational Speech Recognition System

    Full text link
    We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training provide significant gains for all acoustic model architectures. Language model rescoring with multiple forward and backward running RNNLMs, and word posterior-based system combination provide a 20% boost. The best single system uses a ResNet architecture acoustic model with RNNLM rescoring, and achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The combined system has an error rate of 6.2%, representing an improvement over previously reported results on this benchmark task

    The Microsoft 2017 Conversational Speech Recognition System

    Full text link
    We describe the 2017 version of Microsoft's conversational speech recognition system, in which we update our 2016 system with recent developments in neural-network-based acoustic and language modeling to further advance the state of the art on the Switchboard speech recognition task. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby subsets of acoustic models are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added a confusion network rescoring step after system combination. The resulting system yields a 5.1\% word error rate on the 2000 Switchboard evaluation set

    Erosion characteristics and floc strenght of Athabasca river cohesive sediments: towards managing sediment-related issues

    Get PDF
    Purpose: Most of Canada’s tar sands exploitations are located in the Athabasca river basin. Deposited cohesive sediments in Athabasca river and tributaries are a potential source of PAHs in the basin. Erosional behavior of cohesive sediments depends not only of fluid turbulence but on sediments structure and particularly the influence of organic content. This research tries to describe this behavior in Athabasca river sediments. Methods: An experimental study of cohesive sediments dynamics in one of the tributaries, the Muskeg river, was developed in a rotating annular flume. Variation of the shear stress allowed the determination of erosional strength for beds with different consolidation periods. Particle size measurements were made with a laser diffraction device operated in a continuous flow through mode. Optical analyses of flocs (ESEM and TEM) were performed with samples taken at the end of the experiments. Results: An inverse relationship between suspended sediment concentration (SS) and the consolidation period was found. The differences are related in this research to the increasing organic content of the sediments with consolidation period. The particle size measurements during the experiments showed differences on floc strength that are also related to changing organic content during different consolidation periods. ESEM and TEM observations confirm the structural differences for beds with different consolidation periods. The effects of SFGL on floc structure and in biostabilization of the bed are discussed. Conclusions: It is recommended in this paper that consolidation period should be taken into account for the modeling of erosion of cohesive sediments in the Athabasca river. Relating to transport models of pollutants (PAHs) it is highly recommended to consider flocs organic content, particularly algae, in the resuspension module.Environment Canada, CONACY

    Experimental assessment of Athabasca river cohesive sediment deposition dynamics.

    Get PDF
    Polycyclic aromatic hydrocarbons (PAHs) originating from natural sources, and potentially from the Athabasca Oil Sands development, are of concern for the Athabasca River and Lake Athabasca delta ecosystems. In order to model the transport of fine sediments (and associated PAHs), it is important to describe the sediment dynamics within the river system. Flocs possess different settling characteristics compared to individual particles. A key aspect in modelling floc settling behaviour is the mathematical linkage of the floc density to floc size. In this paper, a rotating annular flume is used to determine the settling characteristics of Muskeg River (a tributary of the Athabasca River) sediments under different shear conditions. Simulations of the settling and flocculation behaviour of these sediments were used to calibrate a density vs. floc size model. A relationship of the parameters relating floc size and density with the fractal dimension F shows that as diameter increases flocs become weaker. Recommendations for the practical application of the model are further formulated in this paper. The deposition tests offer a quantitative measure of the relative amount of sediment that is likely to be transported through the river for given flow conditions.Para el rio Athabasca y los ecosistemas deltaicos del lago athabasca, los hidrocarburos Aromaticos policiclicos (HAPs) originados e fuentes naturales y potencialmente por los desarrollos de arenas bituminosas, son una amenaza. Para poder modelar el transporte de sedimentos finos (y HAPs asociados) es importante describir la dinámica de estos dentro del sistema. Los agregados (floculos) que se forman tiene caracteristicas de sedimentación diferentes a las partículas individuales. Un aspoecto importante para modelar la sedimentación de floculos es la relación matemática entre el tamaño de este y su densidad. En este articulo un canal rotatorio circular es usado para determinar las características de sedimentación de sedimentos del río Muskeg(un tributario del Athabasca) para diferentes condiciones de tasa de corte. Un modelo de densidad de los flóculos vs. tamaño de estos fue calibrado con las simulaciones de la sedimentación de esos sedimentos cohesivos. Una relación obtenida entre tamaño de flóculos, densidad y dimensión fractal F muestra que ha medida que su tamaño aumenta se vuelve mas frágil. Recomendaciones para la aplicación práctica del modelo se sugieren en el artículo. Los tests de deposito presentan una medida cuantitativa de la proporción de sedimentos que es posible que sea transportada por el río dada sus condiciones hidrodinámicas.Environment Canada, CONACY
    corecore