7 research outputs found
WAKE WORD DETECTION AND ITS APPLICATIONS
Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input. Novel methods are proposed to train a wake word detection system from partially labeled training data, and to use it in on-line applications. In the system, the prerequisite of frame-level alignment is removed, permitting the use of un-transcribed training examples that are annotated only for the presence/absence of the wake word. Also, an FST-based decoder is presented to perform online detection. The suite of methods greatly improve the wake word detection performance across several datasets.
A novel neural network for acoustic modeling in wake word detection is also investigated. Specifically, the performance of several variants of chunk-wise streaming Transformers tailored for wake word detection is explored, including looking-ahead to the next chunk, gradient stopping, different positional embedding methods and adding same-layer dependency between chunks. Experiments demonstrate that the proposed Transformer model outperforms the baseline convolutional network significantly with a comparable model size, while still maintaining linear complexity w.r.t. the input length.
For the application of the detected wake word in ASR, the problem of improving speech recognition with the help of the detected wake word is investigated. Voice-controlled house-hold devices face the difficulty of performing speech recognition of device-directed speech in the presence of interfering background speech. Two end-to-end models are proposed to tackle this problem with information extracted from the anchored segment. The anchored segment refers to the wake word segment of the audio stream, which contains valuable speaker information that can be used to suppress interfering speech and background noise. A multi-task learning setup is also explored where the ideal mask, obtained from a data synthesis procedure, is used to guide the model training. In addition, a way to synthesize "noisy" speech from "clean" speech is also proposed to mitigate the mismatch between training and test data. The proposed methods show large word error reduction for Amazon Alexa live data with interfering background speech, without sacrificing the performance on clean speech
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
International audienceAutomatic spoken language identification (LID) is a very important research field in the era of multilingual voice-command-based human-computer interaction (HCI). A front-end LID module helps to improve the performance of many speech-based applications in the multilingual scenario. India is a populous country with diverse cultures and languages. The majority of the Indian population needs to use their respective native languages for verbal interaction with machines. Therefore, the development of efficient Indian spoken language recognition systems is useful for adapting smart technologies in every section of Indian society. The field of Indian LID has started gaining momentum in the last two decades, mainly due to the development of several standard multilingual speech corpora for the Indian languages. Even though significant research progress has already been made in this field, to the best of our knowledge, there are not many attempts to analytically review them collectively. In this work, we have conducted one of the very first attempts to present a comprehensive review of the Indian spoken language recognition research field. In-depth analysis has been presented to emphasize the unique challenges of low-resource and mutual influences for developing LID systems in the Indian contexts. Several essential aspects of the Indian LID research, such as the detailed description of the available speech corpora, the major research contributions, including the earlier attempts based on statistical modeling to the recent approaches based on different neural network architectures, and the future research trends are discussed. This review work will help assess the state of the present Indian LID research by any active researcher or any research enthusiasts from related fields
Adaptation of speech recognition systems to selected real-world deployment conditions
Tato habilitační práce se zabývá problematikou adaptace systémů
rozpoznávání řeči na vybrané reálné podmínky nasazení. Je koncipována
jako sborník celkem dvanácti článků, které se touto problematikou
zabývají. Jde o publikace, jejichž jsem hlavním autorem
nebo spoluatorem, a které vznikly v rámci několika navazujících
výzkumných projektů. Na řešení těchto projektů jsem se
podílel jak v roli člena výzkumného týmu, tak i v roli řešitele nebo
spoluřešitele.
Publikace zařazené do tohoto sborníku lze rozdělit podle tématu
do tří hlavních skupin. Jejich společným jmenovatelem je
snaha přizpůsobit daný rozpoznávací systém novým podmínkám či
konkrétnímu faktoru, který významným způsobem ovlivňuje jeho
funkci či přesnost.
První skupina článků se zabývá úlohou neřízené adaptace na
mluvčího, kdy systém přizpůsobuje svoje parametry specifickým
hlasovým charakteristikám dané mluvící osoby. Druhá část práce
se pak věnuje problematice identifikace neřečových událostí na vstupu
do systému a související úloze rozpoznávání řeči s hlukem
(a zejména hudbou) na pozadí. Konečně třetí část práce se zabývá
přístupy, které umožňují přepis audio signálu obsahujícího promluvy
ve více než v jednom jazyce. Jde o metody adaptace existujícího
rozpoznávacího systému na nový jazyk a metody identifikace
jazyka z audio signálu.
Obě zmíněné identifikační úlohy jsou přitom vyšetřovány zejména
v náročném a méně probádaném režimu zpracování po jednotlivých
rámcích vstupního signálu, který je jako jediný vhodný pro on-line
nasazení, např. pro streamovaná data.This habilitation thesis deals with adaptation of automatic speech
recognition (ASR) systems to selected real-world deployment conditions.
It is presented in the form of a collection of twelve articles
dealing with this task; I am the main author or a co-author of these
articles. They were published during my work on several consecutive
research projects. I have participated in the solution of them
as a member of the research team as well as the investigator or a
co-investigator.
These articles can be divided into three main groups according to
their topics. They have in common the effort to adapt a particular
ASR system to a specific factor or deployment condition that affects
its function or accuracy.
The first group of articles is focused on an unsupervised speaker
adaptation task, where the ASR system adapts its parameters to
the specific voice characteristics of one particular speaker. The second
part deals with a) methods allowing the system to identify
non-speech events on the input, and b) the related task of recognition
of speech with non-speech events, particularly music, in the
background. Finally, the third part is devoted to the methods
that allow the transcription of an audio signal containing multilingual
utterances. It includes a) approaches for adapting the existing
recognition system to a new language and b) methods for identification
of the language from the audio signal.
The two mentioned identification tasks are in particular investigated
under the demanding and less explored frame-wise scenario,
which is the only one suitable for processing of on-line data streams
Computational Intelligence and Human- Computer Interaction: Modern Methods and Applications
The present book contains all of the articles that were accepted and published in the Special Issue of MDPI’s journal Mathematics titled "Computational Intelligence and Human–Computer Interaction: Modern Methods and Applications". This Special Issue covered a wide range of topics connected to the theory and application of different computational intelligence techniques to the domain of human–computer interaction, such as automatic speech recognition, speech processing and analysis, virtual reality, emotion-aware applications, digital storytelling, natural language processing, smart cars and devices, and online learning. We hope that this book will be interesting and useful for those working in various areas of artificial intelligence, human–computer interaction, and software engineering as well as for those who are interested in how these domains are connected in real-life situations
Minimal Infrastructure Radio Frequency Home Localisation Systems
The ability to track the location of a subject in their home allows the provision of a
number of location based services, such as remote activity monitoring, context sensitive
prompts and detection of safety critical situations such as falls. Such pervasive monitoring
functionality offers the potential for elders to live at home for longer periods of their lives
with minimal human supervision.
The focus of this thesis is on the investigation and development of a home roomlevel
localisation technique which can be readily deployed in a realistic home environment
with minimal hardware requirements. A conveniently deployed Bluetooth ®
localisation
platform is designed and experimentally validated throughout the thesis. The platform
adopts the convenience of a mobile phone and the processing power of a remote location
calculation computer. The use of Bluetooth ®
also ensures the extensibility of the platform
to other home health supervision scenarios such as wireless body sensor monitoring.
Central contributions of this work include the comparison of probabilistic and nonprobabilistic
classifiers for location prediction accuracy and the extension of probabilistic
classifiers to a Hidden Markov Model Bayesian filtering framework. New location
prediction performance metrics are developed and signicant performance improvements
are demonstrated with the novel extension of Hidden Markov Models to higher-order
Markov movement models. With the simple probabilistic classifiers, location is correctly
predicted 80% of the time. This increases to 86% with the application of the Hidden
Markov Models and 88% when high-order Hidden Markov Models are employed.
Further novelty is exhibited in the derivation of a real-time Hidden Markov Model
Viterbi decoding algorithm which presents all the advantages of the original algorithm,
while producing location estimates in real-time. Significant contributions are also made
to the field of human gait-recognition by applying Bayesian filtering to the task of motion
detection from accelerometers which are already present in many mobile phones. Bayesian filtering is demonstrated to enable a 35% improvement in motion recognition rate and even
enables a
floor recognition rate of 68% using only accelerometers. The unique application
of time-varying Hidden Markov Models demonstrates the effect of integrating these freely
available motion predictions on long-term location predictions
Recent Advances in Social Data and Artificial Intelligence 2019
The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace