    Adaptation of speech recognition systems to selected real-world deployment conditions

    Tato habilitačnĂ­ prĂĄce se zabĂœvĂĄ problematikou adaptace systĂ©mĆŻ rozpoznĂĄvĂĄnĂ­ ƙeči na vybranĂ© reĂĄlnĂ© podmĂ­nky nasazenĂ­. Je koncipovĂĄna jako sbornĂ­k celkem dvanĂĄcti člĂĄnkĆŻ, kterĂ© se touto problematikou zabĂœvajĂ­. Jde o publikace, jejichĆŸ jsem hlavnĂ­m autorem nebo spoluatorem, a kterĂ© vznikly v rĂĄmci několika navazujĂ­cĂ­ch vĂœzkumnĂœch projektĆŻ. Na ƙeĆĄenĂ­ těchto projektĆŻ jsem se podĂ­lel jak v roli člena vĂœzkumnĂ©ho tĂœmu, tak i v roli ƙeĆĄitele nebo spoluƙeĆĄitele. Publikace zaƙazenĂ© do tohoto sbornĂ­ku lze rozdělit podle tĂ©matu do tƙí hlavnĂ­ch skupin. Jejich společnĂœm jmenovatelem je snaha pƙizpĆŻsobit danĂœ rozpoznĂĄvacĂ­ systĂ©m novĂœm podmĂ­nkĂĄm či konkrĂ©tnĂ­mu faktoru, kterĂœ vĂœznamnĂœm zpĆŻsobem ovlivƈuje jeho funkci či pƙesnost. PrvnĂ­ skupina člĂĄnkĆŻ se zabĂœvĂĄ Ășlohou neƙízenĂ© adaptace na mluvčího, kdy systĂ©m pƙizpĆŻsobuje svoje parametry specifickĂœm hlasovĂœm charakteristikĂĄm danĂ© mluvĂ­cĂ­ osoby. DruhĂĄ část prĂĄce se pak věnuje problematice identifikace neƙečovĂœch udĂĄlostĂ­ na vstupu do systĂ©mu a souvisejĂ­cĂ­ Ășloze rozpoznĂĄvĂĄnĂ­ ƙeči s hlukem (a zejmĂ©na hudbou) na pozadĂ­. Konečně tƙetĂ­ část prĂĄce se zabĂœvĂĄ pƙístupy, kterĂ© umoĆŸĆˆujĂ­ pƙepis audio signĂĄlu obsahujĂ­cĂ­ho promluvy ve vĂ­ce neĆŸ v jednom jazyce. Jde o metody adaptace existujĂ­cĂ­ho rozpoznĂĄvacĂ­ho systĂ©mu na novĂœ jazyk a metody identifikace jazyka z audio signĂĄlu. Obě zmĂ­něnĂ© identifikačnĂ­ Ășlohy jsou pƙitom vyĆĄetƙovĂĄny zejmĂ©na v nĂĄročnĂ©m a mĂ©ně probĂĄdanĂ©m reĆŸimu zpracovĂĄnĂ­ po jednotlivĂœch rĂĄmcĂ­ch vstupnĂ­ho signĂĄlu, kterĂœ je jako jedinĂœ vhodnĂœ pro on-line nasazenĂ­, napƙ. pro streamovanĂĄ data.This habilitation thesis deals with adaptation of automatic speech recognition (ASR) systems to selected real-world deployment conditions. It is presented in the form of a collection of twelve articles dealing with this task; I am the main author or a co-author of these articles. They were published during my work on several consecutive research projects. I have participated in the solution of them as a member of the research team as well as the investigator or a co-investigator. These articles can be divided into three main groups according to their topics. They have in common the effort to adapt a particular ASR system to a specific factor or deployment condition that affects its function or accuracy. The first group of articles is focused on an unsupervised speaker adaptation task, where the ASR system adapts its parameters to the specific voice characteristics of one particular speaker. The second part deals with a) methods allowing the system to identify non-speech events on the input, and b) the related task of recognition of speech with non-speech events, particularly music, in the background. Finally, the third part is devoted to the methods that allow the transcription of an audio signal containing multilingual utterances. It includes a) approaches for adapting the existing recognition system to a new language and b) methods for identification of the language from the audio signal. The two mentioned identification tasks are in particular investigated under the demanding and less explored frame-wise scenario, which is the only one suitable for processing of on-line data streams

    Translating English verbal collocations into Spanish: On distribution and other relevant differences related to diatopic variation

    Language varieties should be taken into account in order to enhance fluency and naturalness of translated texts. In this paper we will examine the collocational verbal range for prima-facie translation equivalents of words like decision and dilemma, which in both languages denote the act or process of reaching a resolution after consideration, resolving a question or deciding something. We will be mainly concerned with diatopic variation in Spanish. To this end, we set out to develop a giga-token corpus-based protocol which includes a detailed and reproducible methodology sufficient to detect collocational peculiarities of transnational languages. To our knowledge, this is one of the first observational studies of this kind. The paper is organised as follows. Section 1 introduces some basic issues about the translation of collocations against the background of languages’ anisomorphism. Section 2 provides a feature characterisation of collocations. Section 3 deals with the choice of corpora, corpus tools, nodes and patterns. Section 4 covers the automatic retrieval of the selected verb + noun (object) collocations in general Spanish and the co-existing national varieties. Special attention is paid to comparative results in terms of similarities and mismatches. Section 5 presents conclusions and outlines avenues of further research.Published versio

    Design of a Controlled Language for Critical Infrastructures Protection

    We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen

    Efficient speaker recognition for mobile devices

    On Distant Speech Recognition for Home Automation

    The official version of this draft is available at Springer via http://dx.doi.org/10.1007/978-3-319-16226-3_7International audienceIn the framework of Ambient Assisted Living, home automation may be a solution for helping elderly people living alone at home. This study is part of the Sweet-Home project which aims at developing a new home automation system based on voice command to improve support and well-being of people in loss of autonomy. The goal of the study is vocal order recognition with a focus on two aspects: distance speech recognition and sentence spotting. Several ASR techniques were evaluated on a realistic corpus acquired in a 4-room flat equipped with microphones set in the ceiling. This distant speech French corpus was recorded with 21 speakers who acted scenarios of activities of daily living. Techniques acting at the decoding stage, such as our novel approach called Driven Decoding Algorithm (DDA), gave better speech recognition results than the baseline and other approaches. This solution which uses the two best SNR channels and a priori knowledge (voice commands and distress sentences) has demonstrated an increase in recognition rate without introducing false alarms

    Distant speech recognition for home automation: Preliminary experimental results in a smart home

    International audienceThis paper presents a study that is part of the Sweet-Home project which aims at developing a new home automation system based on voice command. The study focused on two tasks: distant speech recognition and sentence spotting (e.g., recognition of domotic orders). Regarding the first task, different combinations of ASR systems, language and acoustic models were tested. Fusion of ASR outputs by consensus and with a triggered language model (using a priori knowledge) were investigated. For the sentence spotting task, an algorithm based on distance evaluation between the current ASR hypotheses and the predefine set of keyword patterns was introduced in order to retrieve the correct sentences in spite of the ASR errors. The techniques were assessed on real daily living data collected in a 4-room smart home that was fully equipped with standard tactile commands and with 7 wireless microphones set in the ceiling. Thanks to Driven Decoding Algorithm techniques, a classical ASR system reached 7.9% WER against 35% WER in standard configuration and 15% with MLLR adaptation only. The best keyword pattern classification result obtained in distant speech conditions was 7.5% CER

    Speech Activity and Speaker Change Point Detection for Online Streams

    DisertačnĂ­ prĂĄce je věnovĂĄna dvěma si blĂ­zkĂœm ƙečovĂœm ĂșlohĂĄm a nĂĄsledně jejich pouĆŸitĂ­ v online prostƙedĂ­. KonkrĂ©tně se jednĂĄ o Ășlohy detekce ƙeči a detekce změny mluvčího. Ty jsou často nedĂ­lnou součástĂ­ systĂ©mĆŻ pro zpracovĂĄnĂ­ ƙeči (napƙ. pro diarizaci mluvčích nebo rozpoznĂĄvĂĄnĂ­ ƙeči), kde slouĆŸĂ­ pro pƙedzpracovĂĄnĂ­ akustickĂ©ho signĂĄlu. Obě Ășlohy jsou v literatuƙe velmi aktivnĂ­m tĂ©matem, ale větĆĄina existujĂ­cĂ­ch pracĂ­ je směƙovĂĄna primĂĄrně na offline vyuĆŸitĂ­. NicmĂ©ně prĂĄvě online nasazenĂ­ je nezbytnĂ© pro některĂ© ƙečovĂ© aplikace, kterĂ© musĂ­ fungovat v reĂĄlnĂ©m čase (napƙ. monitorovacĂ­ systĂ©my).ÚvodnĂ­ část disertačnĂ­ prĂĄce je tvoƙena tƙemi kapitolami. V tĂ© prvnĂ­ jsou vysvětleny zĂĄkladnĂ­ pojmy a nĂĄsledně je nastĂ­něno vyuĆŸitĂ­ obou Ășloh. DruhĂĄ kapitola je věnovĂĄna současnĂ©mu poznĂĄnĂ­ a je doplněna o pƙehled existujĂ­cĂ­ch nĂĄstrojĆŻ. PoslednĂ­ kapitola se sklĂĄdĂĄ z motivace a z praktickĂ©ho pouĆŸitĂ­ zmĂ­něnĂœch Ășloh v monitorovacĂ­ch systĂ©mech. V zĂĄvěru ĂșvodnĂ­ části jsou stanoveny cĂ­le prĂĄce.NĂĄsledujĂ­cĂ­ dvě kapitoly jsou věnovĂĄny teoretickĂœm zĂĄkladĆŻm obou Ășloh. PƙedstavujĂ­ vybranĂ© pƙístupy, kterĂ© jsou buď relevantnĂ­ pro disertačnĂ­ prĂĄci (porovnĂĄnĂ­ vĂœsledkĆŻ), nebo jsou zaměƙenĂ© na pouĆŸitĂ­ v online prostƙedĂ­.V dalĆĄĂ­ kapitole je pƙedloĆŸen finĂĄlnĂ­ pƙístup pro detekci ƙeči. PostupnĂœ nĂĄvrh tohoto pƙístupu, společně s experimentĂĄlnĂ­m vyhodnocenĂ­m, je zde detailně rozebrĂĄn. Pƙístup dosahuje nejlepĆĄĂ­ch vĂœsledkĆŻ na korpusu QUT-NOISE-TIMIT v podmĂ­nkĂĄch s nĂ­zkĂœm a stƙednĂ­m zaĆĄuměnĂ­m. Pƙístup je takĂ© začleněn do monitorovacĂ­ho systĂ©mu, kde doplƈuje svojĂ­ funkcionalitou rozpoznĂĄvač ƙeči.NĂĄsledujĂ­cĂ­ kapitola detailně pƙedstavuje finĂĄlnĂ­ pƙístup pro detekci změny mluvčího. Ten byl navrĆŸen v rĂĄmci několika po sobě jdoucĂ­ch experimentĆŻ, kterĂ© tato kapitola takĂ© pƙibliĆŸuje. VĂœsledky zĂ­skanĂ© na databĂĄzi COST278 se blĂ­ĆŸĂ­ vĂœsledkĆŻm, kterĂœch dosĂĄhl referenčnĂ­ offline systĂ©m, ale pƙedloĆŸenĂœ pƙístup jich docĂ­lil v online mĂłdu a to s nĂ­zkou latencĂ­.VĂœstupy disertačnĂ­ prĂĄce jsou shrnuty v zĂĄvěrečnĂ© kapitole.The main focus of this thesis lies on two closely interrelated tasks, speech activity detection and speaker change point detection, and their applications in online processing. These tasks commonly play a crucial role of speech preprocessors utilized in speech-processing applications, such as automatic speech recognition or speaker diarization. While their use in offline systems is extensively covered in literature, the number of published works focusing on online use is limited.This is unfortunate, as many speech-processing applications (e.g., monitoring systems) are required to be run in real time.The thesis begins with a three-chapter opening part, where the first introductory chapter explains the basic concepts and outlines the practical use of both tasks. It is followed by a chapter, which reviews the current state of the art and lists the existing toolkits. That part is concluded by a chapter explaining the motivation behind this work and the practical use in monitoring systems; ultimately, this chapter sets the main goals of this thesis.The next two chapters cover the theoretical background of both tasks. They present selected approaches relevant to this work (e.g., used for result comparisons) or focused on online processing.The following chapter proposes the final speech activity detection approach for online use. Within this chapter, a detailed description of the development of this approach is available as well as its thorough experimental evaluation. This approach yields state-of-the-art results under low- and medium-noise conditions on the standardized QUT-NOISE-TIMIT corpus. It is also integrated into a monitoring system, where it supplements a speech recognition system.The final speaker change point detection approach is proposed in the following chapter. It was designed in a series of consecutive experiments, which are extensively detailed in this chapter. An experimental evaluation of this approach on the COST278 database shows the performance of approaching the offline reference system while operating in online mode with low latency.Finally, the last chapter summarizes all the results of this thesis
