251 research outputs found

    Applications of missing feature theory to speaker recognition

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 100-101).An important problem in speaker recognition is the degradation that occurs when speaker models trained with speech from one type of channel are used to score speech from another type of channel, known as channel mismatch. This thesis investigates various channel compensation techniques and approaches from missing feature theory for improving Gaussian mixture model (GMM)-based speaker verification under this mismatch condition. Experiments are performed using a speech corpus consisting of "clean" training speech and "dirty" test speech equal to the clean speech corrupted by additive Gaussian noise. Channel compensation methods studied are cepstral mean subtraction, RASTA, and spectral subtraction. Approaches to missing feature theory include missing feature compensation, which removes corrupted features, and missing feature restoration which predicts such features from neighboring features in both frequency and time. These methods are investigated both individually and in combination. In particular, missing feature compensation combined with spectral subtraction in the discrete Fourier transform domain significantly improves GMM speaker verification accuracy and outperforms all other methods examined in this thesis, reducing the equal error rate by about 10% more than other methods over a SNR range of 5-25 dB. Moreover, this considerably outperforms a state-of-the-art GMM recognizer for the mismatch application that combines missing feature theory with spectral subtraction developed in a mel-filter energy domain. Finally, the concept of missing restoration is explored. A novel linear minimum mean-squared-error missing feature estimator is derived and applied to pure vowels as well as a clean/dirty verification trial. While it does not improve performance in the verification trial, a large SNR improvement for features estimated for the pure vowel case indicate promise in the application of this method.by Michael Thomas Padilla.S.M

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Towards Probabilistic and Partially-Supervised Structural Health Monitoring

    Get PDF
    One of the most significant challenges for signal processing in data-based structural health monitoring (SHM) is a lack of comprehensive data; in particular, recording labels to describe what each of the measured signals represent. For example, consider an offshore wind-turbine, monitored by an SHM strategy. It is infeasible to artificially damage such a high-value asset to collect signals that might relate to the damaged structure in situ; additionally, signals that correspond to abnormal wave-loading, or unusually low-temperatures, could take several years to be recorded. Regular inspections of the turbine in operation, to describe (and label) what measured data represent, would also prove impracticable -- conventionally, it is only possible to check various components (such as the turbine blades) following manual inspection; this involves travelling to a remote, offshore location, which is a high-cost procedure. Therefore, the collection of labelled data is generally limited by some expense incurred when investigating the signals; this might include direct costs, or loss of income due to down-time. Conventionally, incomplete label information forces a dependence on unsupervised machine learning, limiting SHM strategies to damage (i.e. novelty) detection. However, while comprehensive and fully labelled data can be rare, it is often possible to provide labels for a limited subset of data, given a label budget. In this scenario, partially-supervised machine learning should become relevant. The associated algorithms offer an alternative approach to monitor measured data, as they can utilise both labelled and unlabelled signals, within a unifying training scheme. In consequence, this work introduces (and adapts) partially-supervised algorithms for SHM; specifically, semi-supervised and active learning methods. Through applications to experimental data, semi-supervised learning is shown to utilise information in the unlabelled signals, alongside a limited set of labelled data, to further update a predictive-model. On the other hand, active learning improves the predictive performance by querying specific signals to investigate, which are assumed the most informative. Both discriminative and generative methods are investigated, leading towards a novel, probabilistic framework, to classify, investigate, and label signals for online SHM. The findings indicate that, through partially-supervised learning, the cost associated with labelling data can be managed, as the information in a selected subset of labelled signals can be combined with larger sets of unlabelled data -- increasing the potential scope and predictive performance for data-driven SHM

    Online learning of personalised human activity recognition models from user-provided annotations

    Get PDF
    PhD ThesisIn Human Activity Recognition (HAR), supervised and semi-supervised training are important tools for devising parametric activity models. For the best modelling performance, large amounts of annotated personalised sample data are typically required. Annotating often represents the bottleneck in the overall modelling process as it usually involves retrospective analysis of experimental ground truth, like video footage. These approaches typically neglect that prospective users of HAR systems are themselves key sources of ground truth for their own activities. This research therefore involves the users of HAR monitors in the annotation process. The process relies solely on users' short term memory and engages with them to parsimoniously provide annotations for their own activities as they unfold. E ects of user input are optimised by using Online Active Learning (OAL) to identify the most critical annotations which are expected to lead to highly optimal HAR model performance gains. Personalised HAR models are trained from user-provided annotations as part of the evaluation, focusing mainly on objective model accuracy. The OAL approach is contrasted with Random Selection (RS) { a naive method which makes uninformed annotation requests. A range of simulation-based annotation scenarios demonstrate that using OAL brings bene ts in terms of HAR model performance over RS. Additionally, a mobile application is implemented and deployed in a naturalistic context to collect annotations from a panel of human participants. The deployment is proof that the method can truly run in online mode and it also shows that considerable HAR model performance gains can be registered even under realistic conditions. The ndings from this research point to the conclusion that online learning from userprovided annotations is a valid solution to the problem of constructing personalised HAR models

    Time- and value-continuous explainable affect estimation in-the-wild

    Get PDF
    Today, the relevance of Affective Computing, i.e., of making computers recognise and simulate human emotions, cannot be overstated. All technology giants (from manufacturers of laptops to mobile phones to smart speakers) are in a fierce competition to make their devices understand not only what is being said, but also how it is being said to recognise user’s emotions. The goals have evolved from predicting the basic emotions (e.g., happy, sad) to now the more nuanced affective states (e.g., relaxed, bored) real-time. The databases used in such research too have evolved, from earlier featuring the acted behaviours to now spontaneous behaviours. There is a more powerful shift lately, called in-the-wild affect recognition, i.e., taking the research out of the laboratory, into the uncontrolled real-world. This thesis discusses, for the very first time, affect recognition for two unique in-the-wild audiovisual databases, GRAS2 and SEWA. The GRAS2 is the only database till date with time- and value-continuous affect annotations for Labov effect-free affective behaviours, i.e., without the participant’s awareness of being recorded (which otherwise is known to affect the naturalness of one’s affective behaviour). The SEWA features participants from six different cultural backgrounds, conversing using a video-calling platform. Thus, SEWA features in-the-wild recordings further corrupted by unpredictable artifacts, such as the network-induced delays, frame-freezing and echoes. The two databases present a unique opportunity to study time- and value-continuous affect estimation that is truly in-the-wild. A novel ‘Evaluator Weighted Estimation’ formulation is proposed to generate a gold standard sequence from several annotations. An illustration is presented demonstrating that the moving bag-of-words (BoW) representation better preserves the temporal context of the features, yet remaining more robust against the outliers compared to other statistical summaries, e.g., moving average. A novel, data-independent randomised codebook is proposed for the BoW representation; especially useful for cross-corpus model generalisation testing when the feature-spaces of the databases differ drastically. Various deep learning models and support vector regressors are used to predict affect dimensions time- and value-continuously. Better generalisability of the models trained on GRAS2 , despite the smaller training size, makes a strong case for the collection and use of Labov effect-free data. A further foundational contribution is the discovery of the missing many-to-many mapping between the mean square error (MSE) and the concordance correlation coefficient (CCC), i.e., between two of the most popular utility functions till date. The newly invented cost function |MSE_{XY}/σ_{XY}| has been evaluated in the experiments aimed at demystifying the inner workings of a well-performing, simple, low-cost neural network effectively utilising the BoW text features. Also proposed herein is the shallowest-possible convolutional neural network (CNN) that uses the facial action unit (FAU) features. The CNN exploits sequential context, but unlike RNNs, also inherently allows data- and process-parallelism. Interestingly, for the most part, these white-box AI models have shown to utilise the provided features consistent with the human perception of emotion expression

    Human-Computer Interaction

    Get PDF
    In this book the reader will find a collection of 31 papers presenting different facets of Human Computer Interaction, the result of research projects and experiments as well as new approaches to design user interfaces. The book is organized according to the following main topics in a sequential order: new interaction paradigms, multimodality, usability studies on several interaction mechanisms, human factors, universal design and development methodologies and tools

    Multimedia

    Get PDF
    The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications

    Nondestructive Testing (NDT)

    Get PDF
    The aim of this book is to collect the newest contributions by eminent authors in the field of NDT-SHM, both at the material and structure scale. It therefore provides novel insight at experimental and numerical levels on the application of NDT to a wide variety of materials (concrete, steel, masonry, composites, etc.) in the field of Civil Engineering and Architecture

    Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)

    Get PDF
    Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression
    • …
    corecore