Search CORE

862 research outputs found

Studies on noise robust automatic speech recognition

Author: Kurimo Mikko
Palomäki Kalle J.
Remes Ulpu
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2009
Field of study

Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

Aaltodoc Publication Archive

Driver Drowsiness Estimation from EEG Signals Using Online Weighted Adaptation Regularization for Regression (OwARR)

Author: Gordon S
Lance BJ
Lawhern VJ
Lin CT
Wu D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/02/2017
Field of study

© 1993-2012 IEEE. One big challenge that hinders the transition of brain-computer interfaces (BCIs) from laboratory settings to real-life applications is the availability of high-performance and robust learning algorithms that can effectively handle individual differences, i.e., algorithms that can be applied to a new subject with zero or very little subject-specific calibration data. Transfer learning and domain adaptation have been extensively used for this purpose. However, most previous works focused on classification problems. This paper considers an important regression problem in BCI, namely, online driver drowsiness estimation from EEG signals. By integrating fuzzy sets with domain adaptation, we propose a novel online weighted adaptation regularization for regression (OwARR) algorithm to reduce the amount of subject-specific calibration data, and also a source domain selection (SDS) approach to save about half of the computational cost of OwARR. Using a simulated driving dataset with 15 subjects, we show that OwARR and OwARR-SDS can achieve significantly smaller estimation errors than several other approaches. We also provide comprehensive analyses on the robustness of OwARR and OwARR-SDS

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

Histogram equalization for robust text-independent speaker verification in telephone environments

Author: Skosan Marshalleno
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2005
Field of study

Word processed copy. Includes bibliographical references

Cape Town University OpenUCT

Fundamental Approaches to Software Engineering

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2022
Field of study

This open access book constitutes the proceedings of the 25th International Conference on Fundamental Approaches to Software Engineering, FASE 2022, which was held during April 4-5, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 17 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. The proceedings also contain 3 contributions from the Test-Comp Competition. The papers deal with the foundations on which software engineering is built, including topics like software engineering as an engineering discipline, requirements engineering, software architectures, software quality, model-driven development, software processes, software evolution, AI-based software engineering, and the specification, design, and implementation of particular classes of systems, such as (self-)adaptive, collaborative, AI, embedded, distributed, mobile, pervasive, cyber-physical, or service-oriented applications

Directory of Open Access Books (DOAB)

Naturalistic Emotional Speech Corpora with Large Scale Emotional Dimension Ratings

Author: Vaughan Brian
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2011
Field of study

The investigation of the emotional dimensions of speech is dependent on large sets of reliable data. Existing work has been carried out on the creation of emotional speech corpora and the acoustic analysis of emotional speech and this research seeks to buildupon this work while suggesting new methods and areas of potential. A review of the literature determined that a two dimensional emotional model of activation and evaluation was the ideal method for representing the emotional states expressed inspeech. Two case studies were carried out to investigate methods of obtaining naturalunderlying emotional speech in a high quality audio environment, the results of which were used to design a final experimental procedure to elicit natural underlying emotional speech. The speech obtained in this experiment was used in the creation ofa speech corpus that was underpinned by a persistent backend database that incorporated a three-tiered annotation methodology. This methodology was used to comprehensively annotate the metadata, acoustic data and emotional data of the recorded speech. Structuring the three levels of annotation and the assets in a persistent backend database allowed interactive web-based tools to be developed; aweb-based listening tool was developed to obtain a large amount of ratings for the assets that were then written back to the database for analysis. Once a large amount of ratings had been obtained, statistical analysis was used to determine the dimensionalrating for each asset. Acoustic analysis of the underlying emotional speech was then carried out and determined that certain acoustic parameters were correlated with the activation dimension of the dimensional model. This substantiated some of thefindings in the literature review and further determined that spectral energy was strongly correlated with the activation dimension in relation to underlying emotional speech. The lack of a correlation for certain acoustic parameters in relation to the evaluation dimension was also determined, again substantiating some of the findings in the literature.The work contained in this thesis makes a number of contributions to the field: the development of an experimental design to elicit natural underlying emotional speech in a high quality audio environment; the development and implementation of acomprehensive three-tiered corpus annotation methodology; the development and implementation of large scale web based listening tests to rate the emotional dimensions of emotional speech; the determination that certain acoustic parameters are correlated with the activation dimension of a dimensional emotional model inrelation to natural underlying emotional speech and the determination that certain acoustic parameters are not correlated with the evaluation dimension of a twodimensional emotional model in relation to natural underlying emotional speech

Arrow@TUDublin

Fundamental Approaches to Software Engineering

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

Recommended from our members

Digital phenotyping through multimodal, unobtrusive sensing

Author: Perez Pozuelo Ignacio
Publication venue: University of Cambridge
Publication date: 28/08/2020
Field of study

The growing adoption of multimodal wearable and mobile devices, such as smartphones and wrist-worn watches has generated an increase in the collection of physiological and behavioural data at scale. This digital phenotyping data enables researchers to make inferences regarding users’ physical and mental health at scale, for the first time. However, translating this data into actionable insights requires computational approaches that turn unlabelled, multimodal time-series sensor data into validated measures that can be interpreted at scale. This thesis describes the derivation of novel computational methods that leverage digital phenotyping data from wearable devices in large-scale populations to infer physical behaviours. These methods combine insights from signal processing, data mining and machine learning alongside domain knowledge in physical activity and sleep epidemiology. First, the inference of sleeping windows in free-living conditions through a heart rate sensing approach is explored. This algorithm is particularly valuable in the absence of ground truth or sleep diaries given its simplicity, adaptability and capacity for personalization. I then explore multistage sleep classification through combined movement and cardiac wearable sensing and machine learning. Further, I demonstrate that postural changes detected through wrist accelerometers can inform habitual behaviours and are valuable complements to traditional, intensity-based physical activity metrics. I then leverage the concomitant responses of heart rate to physical activity that can be captured through multimodal wearable sensors through a self-supervised training task. The resulting embeddings from this task are shown to be useful for the downstream classification of demographic factors, BMI, energy expenditure and cardiorespiratory fitness. Finally, I describe a deep learning model for the adaptive inference of cardiorespiratory fitness (VO2max) using wearable data in free living conditions. I demonstrate the robustness of the model in a large UK population and show the models’ adaptability by evaluating its performance in a subset of the population with repeated measures ~6 years after the original recordings. Together, this work increases the potential of multimodal wearable and mobile sensors for physical activity and behavioural inferences in population studies. In particular, this thesis showcases the potential of using wearable devices to make valuable physical activity, sleep and fitness inferences in large cohort studies. Given the nature of the data collected and the fact that most of this data is currently generated by commercial providers and not research institutes, laying the foundations for responsible data governance and ethical use of these technologies will be critical to building trust and enabling the development of the field of digital phenotyping.I was funded by GlaxoSmithKline and the Engineering and Physical Sciences Research Council. I was also supported by the Alan Turing Institute through their Enrichment Scheme

Apollo (Cambridge)

Models and analysis of vocal emissions for biomedical applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

Directory of Open Access Books (DOAB)

Finding Frustration: a Dive into the EEG of Drivers

Author: Klosterkamp Marie
Publication venue
Publication date: 01/09/2021
Field of study

Emotion recognition technologies for driving are increasingly used to render automotive travel more pleasurable and, more importantly, safer. Since emotions such as frustration and anger can lead to an increase in traffic accidents, this thesis explored the utility of electroencephalogram (EEG) features to recognize the driver’s frustration level. It, therefore, sought to find a balance between the ecologically valid emotion induction of a driving simulator and the noise-sensitive but highly informative measure of the EEG. Participants’ brain activity was captured with the CGX quick-30 mobile EEG system. 19 participants completed four different frustration-inducing and two baseline driving scenarios in a 360° driving simulator. Subsequently, the participants continuously rated their frustration level based on the replay of each scenario. The resulting subjective measures were used to classify EEG time periods into episodes with or without frustration. Results showed that the frequently used measure of the Alpha Asymmetry Index (AAI) had, as hypothesized, significantly more negative indices for high frustration (vs. no frustration). However, a commingling effect of anger on this result could not be dismissed. The results could not provide evidence for the yet to be replicated previous research of frustration correlates within narrow-band oscillations (delta, theta, alpha, and beta) at specified electrode positions (frontal, central, and posterior). This thesis concludes with suggestions for subsequent research endeavors and forthcoming practical implications in the form of insights acquired

Institute of Transport Research:Publications