24 research outputs found

    Using Tone Information in Thai Spelling Speech Recognition

    Get PDF

    Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages

    Full text link
    This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages. As not much existing work has been carried out on such regional languages, a few difficulties should be addressed before building the systems: limitation on speech and text resources, lack of linguistic knowledge, etc. This work takes Bahasa Indonesia and Thai as examples to illustrate the strategies of collecting various resources required for building ASR systems.Comment: Published by the 2017 IEEE International Conference on Orange Technologies (ICOT 2017

    Modularity and Neural Integration in Large-Vocabulary Continuous Speech Recognition

    Get PDF
    This Thesis tackles the problems of modularity in Large-Vocabulary Continuous Speech Recognition with use of Neural Network

    Multi-Task Neural Networks for Speech Recognition

    Get PDF
    První část této diplomové práci se zabývá teoretickým rozborem principů neuronových sítí, včetně možnosti jejich použití v oblasti rozpoznávání řeči. Práce pokračuje popisem viceúkolových neuronových sítí a souvisejících experimentů. Praktická část práce obsahovala změny software pro trénování neuronových sítí, které umožnily viceúkolové trénování. Je rovněž popsáno připravené prostředí, včetně několika dedikovaných skriptů. Experimenty představené v této diplomové práci ověřují použití artikulačních characteristik řeči pro viceúkolové trénování. Experimenty byly provedeny na dvou řečových databázích lišících se kvalitou a velikostí a representujících různé jazyky - angličtinu a vietnamštinu. Artikulační charakteristiky byly také kombinovány s jinými sekundárními úkoly, například kontextem, s záměrem ověřit jejich komplementaritu. Porovnaní je provedeno s neuronovými sítěmi různých velikostí tak, aby byl popsán vztah mezi velikostí neuronových sítí a efektivitou viceúkolového trénování. Závěrem provedených experimentů je, že viceúkolové trénování s použitím artikulačnich charakteristik jako sekundárních úkolů vede k lepšímu trénování neuronových sítí a výsledkem tohoto trénování může být přesnější rozpoznávání fonémů. V závěru práce jsou viceúkolové neuronové sítě testovány v systému rozpoznávání řeči jako extraktor příznaků.The first part of this Master's thesis covers theoretical investigation into the principles and usage of neural networks, including their usability for the speech recognition tasks. Then it proceeds to summarize the multi-task neural networks' operating principles and some recent experiments with them. The practical part of the semester project reports changes made to a tool for neural network training which support multi-task training. Then the preparation of the settings is described, including a number of scripts written especially for this purpose. The experiments presented in the thesis explore the idea of using articulatory characteristics of phonemes as secondary tasks for multi-task training. The experiments are conducted on two different datasets of different quality and size and representing different languages - English and Vietnamese. Articulatory characteristics are occasionally combined with different secondary tasks, such as context, to see how well they function together. A comparison is made between the networks of different sizes to see how their size affects the effectiveness of multi-task training. These experiments show that multi-task training with the use of articulatory characteristics as secondary tasks can enhance training and yield better phoneme accuracy as a result. Finally, multi-task training is embedded to a speech recognition system as a feature extractor.

    Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

    Get PDF
    This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

    Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

    Get PDF
    The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2 (TEC2012-37585-C02-01) from the Spanish Ministry of Economy and Competitiveness. This research was also funded by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, “Consolidation of Research Units: AtlantTIC Project” CN2012/160)
    corecore