7,768 research outputs found

    Embedded Speech Technology

    Get PDF
    openEnd-to-End models in Automatic Speech Recognition simplify the speech recognition process. They convert audio data directly into text representation without exploiting multiple stages and systems. This direct approach is efficient and reduces potential points of error. On the contrary, Sequence-to-Sequence models adopt a more integrative approach where they use distinct models for retrieving the acoustic and language-specific features, which are respectively known as acoustic and language models. This integration allows for better coordination between different speech aspects, potentially leading to more accurate transcriptions. In this thesis, we explore various Speech-to-Text (STT) models, mainly focusing on End-to-End and Sequence-to-Sequence techniques. We also look into using offline STT tools such as Wav2Vec2.0, Kaldi and Vosk. These tools face challenges when handling new voice data or various accents of the same language. To address this challenge, we fine-tune the models to make them better at handling new, unseen data. Through our comparison, Wav2Vec2.0 emerged as the top performer, though with a larger model size. Our approach also proves that using Kaldi and Vosk together creates a robust STT system that can identify new words using phonemes

    Morphology and speech technology

    Get PDF
    This paper describes a morphological component in a speech recognition architecture for German dealing with the recognition of compounds from their individual constituents. The specification of our morphological model allows for variation in functionality, e.g. the reconstruction of split compounds, of lexicalised, and of non-lexicalised (unknown) compounds.An implementation and evaluation results for split compounds are presented

    Real-time interactive speech technology at Threshold Technology, Incorporated

    Get PDF
    Basic real-time isolated-word recognition techniques are reviewed. Industrial applications of voice technology are described in chronological order of their development. Future research efforts are also discussed

    17 ways to say yes:Toward nuanced tone of voice in AAC and speech technology

    Get PDF
    People with complex communication needs who use speech-generating devices have very little expressive control over their tone of voice. Despite its importance in human interaction, the issue of tone of voice remains all but absent from AAC research and development however. In this paper, we describe three interdisciplinary projects, past, present and future: The critical design collection Six Speaking Chairs has provoked deeper discussion and inspired a social model of tone of voice; the speculative concept Speech Hedge illustrates challenges and opportunities in designing more expressive user interfaces; the pilot project Tonetable could enable participatory research and seed a research network around tone of voice. We speculate that more radical interactions might expand frontiers of AAC and disrupt speech technology as a whole

    Scaling Speech Technology to 1,000+ Languages

    Full text link
    Expanding the language coverage of speech technology has the potential to improve access to information for many more people. However, current speech technology is restricted to about one hundred languages which is a small fraction of the over 7,000 languages spoken around the world. The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on the task. The main ingredients are a new dataset based on readings of publicly available religious texts and effectively leveraging self-supervised learning. We built pre-trained wav2vec 2.0 models covering 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for the same number of languages, as well as a language identification model for 4,017 languages. Experiments show that our multilingual speech recognition model more than halves the word error rate of Whisper on 54 languages of the FLEURS benchmark while being trained on a small fraction of the labeled data

    MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation

    Full text link
    The Multi-target Challenge aims to assess how well current speech technology is able to determine whether or not a recorded utterance was spoken by one of a large number of blacklisted speakers. It is a form of multi-target speaker detection based on real-world telephone conversations. Data recordings are generated from call center customer-agent conversations. The task is to measure how accurately one can detect 1) whether a test recording is spoken by a blacklisted speaker, and 2) which specific blacklisted speaker was talking. This paper outlines the challenge and provides its baselines, results, and discussions.Comment: http://mce.csail.mit.edu . arXiv admin note: text overlap with arXiv:1807.0666

    Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System

    Full text link
    The NWO Priority Programme Language and Speech Technology is a 5-year research programme aiming at the development of spoken language information systems. In the Programme, two alternative natural language processing (NLP) modules are developed in parallel: a grammar-based (conventional, rule-based) module and a data-oriented (memory-based, stochastic, DOP) module. In order to compare the NLP modules, a formal evaluation has been carried out three years after the start of the Programme. This paper describes the evaluation procedure and the evaluation results. The grammar-based component performs much better than the data-oriented one in this comparison.Comment: Proceedings of CLIN 9

    Human factors issues associated with the use of speech technology in the cockpit

    Get PDF
    The human factors issues associated with the use of voice technology in the cockpit are summarized. The formulation of the LHX avionics suite is described and the allocation of tasks to voice in the cockpit is discussed. State-of-the-art speech recognition technology is reviewed. Finally, a questionnaire designed to tap pilot opinions concerning the allocation of tasks to voice input and output in the cockpit is presented. This questionnaire was designed to be administered to operational AH-1G Cobra gunship pilots. Half of the questionnaire deals specifically with the AH-1G cockpit and the types of tasks pilots would like to have performed by voice in this existing rotorcraft. The remaining portion of the questionnaire deals with an undefined rotorcraft of the future and is aimed at determining what types of tasks these pilots would like to have performed by voice technology if anything was possible, i.e. if there were no technological constraints
    • …
    corecore