127 research outputs found
Adaptation of speech recognition systems to selected real-world deployment conditions
Tato habilitaÄnĂ prĂĄce se zabĂœvĂĄ problematikou adaptace systĂ©mĆŻ
rozpoznĂĄvĂĄnĂ ĆeÄi na vybranĂ© reĂĄlnĂ© podmĂnky nasazenĂ. Je koncipovĂĄna
jako sbornĂk celkem dvanĂĄcti ÄlĂĄnkĆŻ, kterĂ© se touto problematikou
zabĂœvajĂ. Jde o publikace, jejichĆŸ jsem hlavnĂm autorem
nebo spoluatorem, a kterĂ© vznikly v rĂĄmci nÄkolika navazujĂcĂch
vĂœzkumnĂœch projektĆŻ. Na ĆeĆĄenĂ tÄchto projektĆŻ jsem se
podĂlel jak v roli Älena vĂœzkumnĂ©ho tĂœmu, tak i v roli ĆeĆĄitele nebo
spoluĆeĆĄitele.
Publikace zaĆazenĂ© do tohoto sbornĂku lze rozdÄlit podle tĂ©matu
do tĆĂ hlavnĂch skupin. Jejich spoleÄnĂœm jmenovatelem je
snaha pĆizpĆŻsobit danĂœ rozpoznĂĄvacĂ systĂ©m novĂœm podmĂnkĂĄm Äi
konkrĂ©tnĂmu faktoru, kterĂœ vĂœznamnĂœm zpĆŻsobem ovlivĆuje jeho
funkci Äi pĆesnost.
PrvnĂ skupina ÄlĂĄnkĆŻ se zabĂœvĂĄ Ășlohou neĆĂzenĂ© adaptace na
mluvÄĂho, kdy systĂ©m pĆizpĆŻsobuje svoje parametry specifickĂœm
hlasovĂœm charakteristikĂĄm danĂ© mluvĂcĂ osoby. DruhĂĄ ÄĂĄst prĂĄce
se pak vÄnuje problematice identifikace neĆeÄovĂœch udĂĄlostĂ na vstupu
do systĂ©mu a souvisejĂcĂ Ășloze rozpoznĂĄvĂĄnĂ ĆeÄi s hlukem
(a zejmĂ©na hudbou) na pozadĂ. KoneÄnÄ tĆetĂ ÄĂĄst prĂĄce se zabĂœvĂĄ
pĆĂstupy, kterĂ© umoĆŸĆujĂ pĆepis audio signĂĄlu obsahujĂcĂho promluvy
ve vĂce neĆŸ v jednom jazyce. Jde o metody adaptace existujĂcĂho
rozpoznĂĄvacĂho systĂ©mu na novĂœ jazyk a metody identifikace
jazyka z audio signĂĄlu.
ObÄ zmĂnÄnĂ© identifikaÄnĂ Ășlohy jsou pĆitom vyĆĄetĆovĂĄny zejmĂ©na
v nĂĄroÄnĂ©m a mĂ©nÄ probĂĄdanĂ©m reĆŸimu zpracovĂĄnĂ po jednotlivĂœch
rĂĄmcĂch vstupnĂho signĂĄlu, kterĂœ je jako jedinĂœ vhodnĂœ pro on-line
nasazenĂ, napĆ. pro streamovanĂĄ data.This habilitation thesis deals with adaptation of automatic speech
recognition (ASR) systems to selected real-world deployment conditions.
It is presented in the form of a collection of twelve articles
dealing with this task; I am the main author or a co-author of these
articles. They were published during my work on several consecutive
research projects. I have participated in the solution of them
as a member of the research team as well as the investigator or a
co-investigator.
These articles can be divided into three main groups according to
their topics. They have in common the effort to adapt a particular
ASR system to a specific factor or deployment condition that affects
its function or accuracy.
The first group of articles is focused on an unsupervised speaker
adaptation task, where the ASR system adapts its parameters to
the specific voice characteristics of one particular speaker. The second
part deals with a) methods allowing the system to identify
non-speech events on the input, and b) the related task of recognition
of speech with non-speech events, particularly music, in the
background. Finally, the third part is devoted to the methods
that allow the transcription of an audio signal containing multilingual
utterances. It includes a) approaches for adapting the existing
recognition system to a new language and b) methods for identification
of the language from the audio signal.
The two mentioned identification tasks are in particular investigated
under the demanding and less explored frame-wise scenario,
which is the only one suitable for processing of on-line data streams
Translating English verbal collocations into Spanish: On distribution and other relevant differences related to diatopic variation
Language varieties should be taken into account in order to enhance fluency and naturalness of translated texts. In this paper we will examine the collocational verbal range for prima-facie translation equivalents of words like decision and dilemma, which in both languages denote the act or process of reaching a resolution after consideration, resolving a question or deciding something. We will be mainly concerned with diatopic variation in Spanish. To this end, we set out to develop a giga-token corpus-based protocol which includes a detailed and reproducible methodology sufficient to detect collocational peculiarities of transnational languages. To our knowledge, this is one of the first observational studies of this kind. The paper is organised as follows. SectionâŻ1 introduces some basic issues about the translation of collocations against the background of languagesâ anisomorphism. SectionâŻ2 provides a feature characterisation of collocations. SectionâŻ3 deals with the choice of corpora, corpus tools, nodes and patterns. SectionâŻ4 covers the automatic retrieval of the selected verb + noun (object) collocations in general Spanish and the co-existing national varieties. Special attention is paid to comparative results in terms of similarities and mismatches. SectionâŻ5 presents conclusions and outlines avenues of further research.Published versio
Design of a Controlled Language for Critical Infrastructures Protection
We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates
from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically
represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of
traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an
analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen
On Distant Speech Recognition for Home Automation
The official version of this draft is available at Springer via http://dx.doi.org/10.1007/978-3-319-16226-3_7International audienceIn the framework of Ambient Assisted Living, home automation may be a solution for helping elderly people living alone at home. This study is part of the Sweet-Home project which aims at developing a new home automation system based on voice command to improve support and well-being of people in loss of autonomy. The goal of the study is vocal order recognition with a focus on two aspects: distance speech recognition and sentence spotting. Several ASR techniques were evaluated on a realistic corpus acquired in a 4-room flat equipped with microphones set in the ceiling. This distant speech French corpus was recorded with 21 speakers who acted scenarios of activities of daily living. Techniques acting at the decoding stage, such as our novel approach called Driven Decoding Algorithm (DDA), gave better speech recognition results than the baseline and other approaches. This solution which uses the two best SNR channels and a priori knowledge (voice commands and distress sentences) has demonstrated an increase in recognition rate without introducing false alarms
Distant speech recognition for home automation: Preliminary experimental results in a smart home
International audienceThis paper presents a study that is part of the Sweet-Home project which aims at developing a new home automation system based on voice command. The study focused on two tasks: distant speech recognition and sentence spotting (e.g., recognition of domotic orders). Regarding the first task, different combinations of ASR systems, language and acoustic models were tested. Fusion of ASR outputs by consensus and with a triggered language model (using a priori knowledge) were investigated. For the sentence spotting task, an algorithm based on distance evaluation between the current ASR hypotheses and the predefine set of keyword patterns was introduced in order to retrieve the correct sentences in spite of the ASR errors. The techniques were assessed on real daily living data collected in a 4-room smart home that was fully equipped with standard tactile commands and with 7 wireless microphones set in the ceiling. Thanks to Driven Decoding Algorithm techniques, a classical ASR system reached 7.9% WER against 35% WER in standard configuration and 15% with MLLR adaptation only. The best keyword pattern classification result obtained in distant speech conditions was 7.5% CER
Speech Activity and Speaker Change Point Detection for Online Streams
DisertaÄnĂ prĂĄce je vÄnovĂĄna dvÄma si blĂzkĂœm ĆeÄovĂœm ĂșlohĂĄm a nĂĄslednÄ jejich pouĆŸitĂ v online prostĆedĂ. KonkrĂ©tnÄ se jednĂĄ o Ășlohy detekce ĆeÄi a detekce zmÄny mluvÄĂho. Ty jsou Äasto nedĂlnou souÄĂĄstĂ systĂ©mĆŻ pro zpracovĂĄnĂ ĆeÄi (napĆ. pro diarizaci mluvÄĂch nebo rozpoznĂĄvĂĄnĂ ĆeÄi), kde slouĆŸĂ pro pĆedzpracovĂĄnĂ akustickĂ©ho signĂĄlu. ObÄ Ășlohy jsou v literatuĆe velmi aktivnĂm tĂ©matem, ale vÄtĆĄina existujĂcĂch pracĂ je smÄĆovĂĄna primĂĄrnÄ na offline vyuĆŸitĂ. NicmĂ©nÄ prĂĄvÄ online nasazenĂ je nezbytnĂ© pro nÄkterĂ© ĆeÄovĂ© aplikace, kterĂ© musĂ fungovat v reĂĄlnĂ©m Äase (napĆ. monitorovacĂ systĂ©my).ĂvodnĂ ÄĂĄst disertaÄnĂ prĂĄce je tvoĆena tĆemi kapitolami. V tĂ© prvnĂ jsou vysvÄtleny zĂĄkladnĂ pojmy a nĂĄslednÄ je nastĂnÄno vyuĆŸitĂ obou Ășloh. DruhĂĄ kapitola je vÄnovĂĄna souÄasnĂ©mu poznĂĄnĂ a je doplnÄna o pĆehled existujĂcĂch nĂĄstrojĆŻ. PoslednĂ kapitola se sklĂĄdĂĄ z motivace a z praktickĂ©ho pouĆŸitĂ zmĂnÄnĂœch Ășloh v monitorovacĂch systĂ©mech. V zĂĄvÄru ĂșvodnĂ ÄĂĄsti jsou stanoveny cĂle prĂĄce.NĂĄsledujĂcĂ dvÄ kapitoly jsou vÄnovĂĄny teoretickĂœm zĂĄkladĆŻm obou Ășloh. PĆedstavujĂ vybranĂ© pĆĂstupy, kterĂ© jsou buÄ relevantnĂ pro disertaÄnĂ prĂĄci (porovnĂĄnĂ vĂœsledkĆŻ), nebo jsou zamÄĆenĂ© na pouĆŸitĂ v online prostĆedĂ.V dalĆĄĂ kapitole je pĆedloĆŸen finĂĄlnĂ pĆĂstup pro detekci ĆeÄi. PostupnĂœ nĂĄvrh tohoto pĆĂstupu, spoleÄnÄ s experimentĂĄlnĂm vyhodnocenĂm, je zde detailnÄ rozebrĂĄn. PĆĂstup dosahuje nejlepĆĄĂch vĂœsledkĆŻ na korpusu QUT-NOISE-TIMIT v podmĂnkĂĄch s nĂzkĂœm a stĆednĂm zaĆĄumÄnĂm. PĆĂstup je takĂ© zaÄlenÄn do monitorovacĂho systĂ©mu, kde doplĆuje svojĂ funkcionalitou rozpoznĂĄvaÄ ĆeÄi.NĂĄsledujĂcĂ kapitola detailnÄ pĆedstavuje finĂĄlnĂ pĆĂstup pro detekci zmÄny mluvÄĂho. Ten byl navrĆŸen v rĂĄmci nÄkolika po sobÄ jdoucĂch experimentĆŻ, kterĂ© tato kapitola takĂ© pĆibliĆŸuje. VĂœsledky zĂskanĂ© na databĂĄzi COST278 se blĂĆŸĂ vĂœsledkĆŻm, kterĂœch dosĂĄhl referenÄnĂ offline systĂ©m, ale pĆedloĆŸenĂœ pĆĂstup jich docĂlil v online mĂłdu a to s nĂzkou latencĂ.VĂœstupy disertaÄnĂ prĂĄce jsou shrnuty v zĂĄvÄreÄnĂ© kapitole.The main focus of this thesis lies on two closely interrelated tasks, speech activity detection and speaker change point detection, and their applications in online processing. These tasks commonly play a crucial role of speech preprocessors utilized in speech-processing applications, such as automatic speech recognition or speaker diarization. While their use in offline systems is extensively covered in literature, the number of published works focusing on online use is limited.This is unfortunate, as many speech-processing applications (e.g., monitoring systems) are required to be run in real time.The thesis begins with a three-chapter opening part, where the first introductory chapter explains the basic concepts and outlines the practical use of both tasks. It is followed by a chapter, which reviews the current state of the art and lists the existing toolkits. That part is concluded by a chapter explaining the motivation behind this work and the practical use in monitoring systems; ultimately, this chapter sets the main goals of this thesis.The next two chapters cover the theoretical background of both tasks. They present selected approaches relevant to this work (e.g., used for result comparisons) or focused on online processing.The following chapter proposes the final speech activity detection approach for online use. Within this chapter, a detailed description of the development of this approach is available as well as its thorough experimental evaluation. This approach yields state-of-the-art results under low- and medium-noise conditions on the standardized QUT-NOISE-TIMIT corpus. It is also integrated into a monitoring system, where it supplements a speech recognition system.The final speaker change point detection approach is proposed in the following chapter. It was designed in a series of consecutive experiments, which are extensively detailed in this chapter. An experimental evaluation of this approach on the COST278 database shows the performance of approaching the offline reference system while operating in online mode with low latency.Finally, the last chapter summarizes all the results of this thesis
- âŠ