Search CORE

5,760 research outputs found

Detecting autism, emotions and social signals using AdaBoost

Author: Busa-Fekete Róbert
Gosztolya Gábor
Tóth László
Publication venue: Interspeech
Publication date: 01/01/2013
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

Author: Busso Carlos
Tao Fei
Publication venue
Publication date: 12/09/2018
Field of study

Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

arXiv.org e-Print Archive

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

Efficient speaker recognition for mobile devices

Author: Karpov Evgeny
Publication venue: University of Eastern Finland
Publication date
Field of study

UEF Electronic Publications

Double Layer Architectures for Automatic Speech Recognition Using HMM

Author: Jose A. R. Fonollosa
Marta Casar
Publication venue: 'IntechOpen'
Publication date: 01/06/2007
Field of study

IntechOpen

Probabilistic modelling and inference of human behaviour from mobile phone time series

Author: Choujaa Driss
Choujaa Driss
Publication venue: Computing, Imperial College London
Publication date: 01/03/2010
Field of study

With an estimated 4.1 billion subscribers around the world, the mobile phone offers a unique opportunity to sense and understand human behaviour from location, co-presence and communication data. While the benefit of modelling this unprecedented amount of data is widely recognised, a number of challenges impede the development of accurate behaviour models. In this thesis, we identify and address two modelling problems and show that their consideration improves the accuracy of behaviour inference. We first examine the modelling of long-range dependencies in human behaviour. Human behaviour models only take into account short-range dependencies in mobile phone time series. Using information theory, we quantify long-range dependencies in mobile phone time series for the first time, demonstrate that they exhibit periodic oscillations and introduce novel tools to analyse them. We further show that considering what the user did 24 hours earlier improves accuracy when predicting user behaviour five hours or longer in advance. The second problem that we address is the modelling of temporal variations in human behaviour. The time spent by a user on an activity varies from one day to the next. In order to recognise behaviour patterns despite temporal variations, we establish a methodological connection between human behaviour modelling and biological sequence alignment. This connection allows us to compare, cluster and model behaviour sequences and introduce novel features for behaviour recognition which improve its accuracy. The experiments presented in this thesis have been conducted on the largest publicly available mobile phone dataset labelled in an unsupervised fashion and are entirely repeatable. Furthermore, our techniques only require cellular data which can easily be recorded by today's mobile phones and could benefit a wide range of applications including life logging, health monitoring, customer profiling and large-scale surveillance

Spiral - Imperial College Digital Repository