12,931 research outputs found

    Contextual confidence measures for continuous speech recognition

    Get PDF
    This paper explores the repercussion of contextual information into confidence measuring for continuous speech recognition results. Our approach comprises three steps: to extract confidence predictors out of recognition results, to compile those predictors into confidence measures by means of a fuzzy inference system whose parameters have been estimated, directly from examples, with an evolutionary strategy and, finally, to upgrade the confidence measures by the inclusion of contextual information. Through experimentation with two different continuous speech application tasks, results show that the context re-scoring procedure improves the capabilities of confidence measures to discriminate between correct and incorrect recognition results for every level of thresholding, even when a rather simple method to add contextual information is considered.Peer ReviewedPostprint (published version

    Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks

    Full text link
    Automatically assessing emotional valence in human speech has historically been a difficult task for machine learning algorithms. The subtle changes in the voice of the speaker that are indicative of positive or negative emotional states are often "overshadowed" by voice characteristics relating to emotional intensity or emotional activation. In this work we explore a representation learning approach that automatically derives discriminative representations of emotional speech. In particular, we investigate two machine learning strategies to improve classifier performance: (1) utilization of unlabeled data using a deep convolutional generative adversarial network (DCGAN), and (2) multitask learning. Within our extensive experiments we leverage a multitask annotated emotional corpus as well as a large unlabeled meeting corpus (around 100 hours). Our speaker-independent classification experiments show that in particular the use of unlabeled data in our investigations improves performance of the classifiers and both fully supervised baseline approaches are outperformed considerably. We improve the classification of emotional valence on a discrete 5-point scale to 43.88% and on a 3-point scale to 49.80%, which is competitive to state-of-the-art performance

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Segmentation ART: A Neural Network for Word Recognition from Continuous Speech

    Full text link
    The Segmentation ATIT (Adaptive Resonance Theory) network for word recognition from a continuous speech stream is introduced. An input sequeuce represents phonemes detected at a preproccesing stage. Segmentation ATIT is trained rapidly, and uses a fast-learning fuzzy ART modules, top-down expectation, and a spatial representation of temporal order. The network performs on-line identification of word boundaries, correcting an initial hypothesis if subsequent phonemes are incompatible with a previous partition. Simulations show that the system's segmentation perfonnance is comparable to that of TRACE, and the ability to segment a number of difficult phrases is also demonstrated.National Science Foundation (NSF-IRI-94-01659); Office of Naval Research (N00014-95-1-0409, N00014-95-1-0G57
    • …
    corecore