12,801 research outputs found

    Low-resource speech recognition and keyword-spotting

    Get PDF
    © Springer International Publishing AG 2017. The IARPA Babel program ran from March 2012 to November 2016. The aim of the program was to develop agile and robust speech technology that can be rapidly applied to any human language in order to provide effective search capability on large quantities of real world data. This paper will describe some of the developments in speech recognition and keyword-spotting during the lifetime of the project. Two technical areas will be briefly discussed with a focus on techniques developed at Cambridge University: the application of deep learning for low-resource speech recognition; and efficient approaches for keyword spotting. Finally a brief analysis of the Babel speech language characteristics and language performance will be presented

    Distant speech recognition for home automation: Preliminary experimental results in a smart home

    Full text link
    International audienceThis paper presents a study that is part of the Sweet-Home project which aims at developing a new home automation system based on voice command. The study focused on two tasks: distant speech recognition and sentence spotting (e.g., recognition of domotic orders). Regarding the first task, different combinations of ASR systems, language and acoustic models were tested. Fusion of ASR outputs by consensus and with a triggered language model (using a priori knowledge) were investigated. For the sentence spotting task, an algorithm based on distance evaluation between the current ASR hypotheses and the predefine set of keyword patterns was introduced in order to retrieve the correct sentences in spite of the ASR errors. The techniques were assessed on real daily living data collected in a 4-room smart home that was fully equipped with standard tactile commands and with 7 wireless microphones set in the ceiling. Thanks to Driven Decoding Algorithm techniques, a classical ASR system reached 7.9% WER against 35% WER in standard configuration and 15% with MLLR adaptation only. The best keyword pattern classification result obtained in distant speech conditions was 7.5% CER

    Distant Speech Recognition for Home Automation: Preliminary Experimental Results in a Smart Home

    No full text
    International audienceThis paper presents a study that is part of the Sweet-Home project which aims at developing a new home automation system based on voice command. The study focused on two tasks: distant speech recognition and sentence spotting (e.g., recognition of domotic orders). Regarding the first task, different combinations of ASR systems, language and acoustic models were tested. Fusion of ASR outputs by consensus and with a triggered language model (using a priori knowledge) were investigated. For the sentence spotting task, an algorithm based on distance evaluation between the current ASR hypotheses and the predefine set of keyword patterns was introduced in order to retrieve the correct sentences in spite of the ASR errors. The techniques were assessed on real daily living data collected in a 4-room smart home that was fully equipped with standard tactile commands and with 7 wireless microphones set in the ceiling. Thanks to Driven Decoding Algorithm techniques, a classical ASR system reached 7.9% WER against 35% WER in standard configuration and 15% with MLLR adaptation only. The best keyword pattern classification result obtained in distant speech conditions was 7.5% CER

    On Distant Speech Recognition for Home Automation

    No full text
    The official version of this draft is available at Springer via http://dx.doi.org/10.1007/978-3-319-16226-3_7International audienceIn the framework of Ambient Assisted Living, home automation may be a solution for helping elderly people living alone at home. This study is part of the Sweet-Home project which aims at developing a new home automation system based on voice command to improve support and well-being of people in loss of autonomy. The goal of the study is vocal order recognition with a focus on two aspects: distance speech recognition and sentence spotting. Several ASR techniques were evaluated on a realistic corpus acquired in a 4-room flat equipped with microphones set in the ceiling. This distant speech French corpus was recorded with 21 speakers who acted scenarios of activities of daily living. Techniques acting at the decoding stage, such as our novel approach called Driven Decoding Algorithm (DDA), gave better speech recognition results than the baseline and other approaches. This solution which uses the two best SNR channels and a priori knowledge (voice commands and distress sentences) has demonstrated an increase in recognition rate without introducing false alarms

    Phonetic Searching

    Get PDF
    An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.Georgia Tech Research Corporatio

    Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean

    Full text link
    A new tightly coupled speech and natural language integration model is presented for a TDNN-based continuous possibly large vocabulary speech recognition system for Korean. Unlike popular n-best techniques developed for integrating mainly HMM-based speech recognition and natural language processing in a {\em word level}, which is obviously inadequate for morphologically complex agglutinative languages, our model constructs a spoken language system based on a {\em morpheme-level} speech and language integration. With this integration scheme, the spoken Korean processing engine (SKOPE) is designed and implemented using a TDNN-based diphone recognition module integrated with a Viterbi-based lexical decoding and symbolic phonological/morphological co-analysis. Our experiment results show that the speaker-dependent continuous {\em eojeol} (Korean word) recognition and integrated morphological analysis can be achieved with over 80.6% success rate directly from speech inputs for the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer processing of oriental language journa
    corecore