49 research outputs found

    On-line adaptive learning of the correlated continuous density hidden Markov models for speech recognition

    Get PDF
    We extend our previously proposed quasi-Bayes adaptive learning framework to cope with the correlated continuous density hidden Markov models (HMMs) with Gaussian mixture state observation densities in which all mean vectors are assumed to be correlated and have a joint prior distribution. A successive approximation algorithm is proposed to implement the correlated mean vectors' updating. As an example, by applying the method to an on-line speaker adaptation application, the algorithm is experimentally shown to be asymptotically convergent as well as being able to enhance the efficiency and the effectiveness of the Bayes learning by taking into account the correlation information between different model parameters. The technique can be used to cope with the time-varying nature of some acoustic and environmental variabilities, including mismatches caused by changing speakers, channels, transducers, environments, and so on.published_or_final_versio

    Advances in Hyperspectral Image Classification: Earth monitoring with statistical learning methods

    Full text link
    Hyperspectral images show similar statistical properties to natural grayscale or color photographic images. However, the classification of hyperspectral images is more challenging because of the very high dimensionality of the pixels and the small number of labeled examples typically available for learning. These peculiarities lead to particular signal processing problems, mainly characterized by indetermination and complex manifolds. The framework of statistical learning has gained popularity in the last decade. New methods have been presented to account for the spatial homogeneity of images, to include user's interaction via active learning, to take advantage of the manifold structure with semisupervised learning, to extract and encode invariances, or to adapt classifiers and image representations to unseen yet similar scenes. This tutuorial reviews the main advances for hyperspectral remote sensing image classification through illustrative examples.Comment: IEEE Signal Processing Magazine, 201

    The use of speaker correlation information for automatic speech recognition

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 171-179).by Timothy J. Hazen.Ph.D

    A Study of the Automatic Speech Recognition Process and Speaker Adaptation

    Get PDF
    This thesis considers the entire automated speech recognition process and presents a standardised approach to LVCSR experimentation with HMMs. It also discusses various approaches to speaker adaptation such as MLLR and multiscale, and presents experimental results for cross­-task speaker adaptation. An analysis of training parameters and data sufficiency for reasonable system performance estimates are also included. It is found that Maximum Likelihood Linear Regression (MLLR) supervised adaptation can result in 6% reduction (absolute) in word error rate given only one minute of adaptation data, as compared with an unadapted model set trained on a different task. The unadapted system performed at 24% WER and the adapted system at 18% WER. This is achieved with only 4 to 7 adaptation classes per speaker, as generated from a regression tree

    Recurrent neural networks for multi-microphone speech separation

    Get PDF
    This thesis takes the classical signal processing problem of separating the speech of a target speaker from a real-world audio recording containing noise, background interference — from competing speech or other non-speech sources —, and reverberation, and seeks data-driven solutions based on supervised learning methods, particularly recurrent neural networks (RNNs). Such speech separation methods can inject robustness in automatic speech recognition (ASR) systems and have been an active area of research for the past two decades. We particularly focus on applications where multi-channel recordings are available. Stand-alone beamformers cannot simultaneously suppress diffuse-noise and protect the desired signal from any distortions. Post-filters complement the beamformers in obtaining the minimum mean squared error (MMSE) estimate of the desired signal. Time-frequency (TF) masking — a method having roots in computational auditory scene analysis (CASA) — is a suitable candidate for post-filtering, but the challenge lies in estimating the TF masks. The use of RNNs — in particular the bi-directional long short-term memory (BLSTM) architecture — as a post-filter estimating TF masks for a delay-and-sum beamformer (DSB) — using magnitude spectral and phase-based features — is proposed. The data—recorded in 4 challenging realistic environments—from the CHiME-3 challenge is used. Two different TF masks — Wiener filter and log-ratio — are identified as suitable targets for learning. The separated speech is evaluated based on objective speech intelligibility measures: short-term objective intelligibility (STOI) and frequency-weighted segmental SNR (fwSNR). The word error rates (WERs) as reported by the previous state-of-the-art ASR back-end — when fed with the test data of the CHiME-3 challenge — are interpreted against the objective scores for understanding the relationships of the latter with the former. Overall, a consistent improvement in the objective scores brought in by the RNNs is observed compared to that of feed-forward neural networks and a baseline MVDR beamformer

    Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

    Full text link
    Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy

    PROGRAM REVIEW 1993: Self Study Report Department of Biometry

    Get PDF
    The CSRS review team applauds the statistical expertise of the Biometry Department which began as the statistical Laboratory in 1957 and culminated with the current academic Department of Biometry. This enhancement has been highlighted by a significant increase in the number of faculty and staff, the initiation of a Master of Science program, and the provision of graduate assistant stipends. With the presence of seven faculty, the imminent increase from seven to fifteen graduate students, the establishment of statistical consulting with numerous IANR faculty, and the diverse research and teaching expertise of the faculty, the department is poised to provide greater service to the University. Future goals may include: (1) establishment of a statistical department worthy of national recognition by joining the faculties of the Department of Biometry and the Division of statistics from the Department of Mathematics and Statistics, and (2) the formation of a PhD program in Statistics that encompasses biometry and theoretical statistics. It is apparent that the faculty is capable of conducting statistical research of a more theoretical nature. However, securing research grants as principal investigators or as co-investigators with other UNL faculty is required to fully support those research endeavors. With successful grant activity a greater portion of research results should be published in statistical journals. Based on discussions, consultations on experimental design and data analysis are appropriate and much appreciated by IANR faculty. While personal consultations have been highly beneficial, the initiation of a Help Desk provides rapid and accurate response to straightforward statistical questions; thereby relieving the Biometry faculty for personal ·consultation on more complex statistical issues. The help desk provides valuable and real world training for graduate students. Courses taught as a service to undergraduate and graduate IANR students appear to be appropriate in number and content. With a master\u27s program successfully started, more formal policies for recruitment, selection, advising, and placement should be initiated. Further attention is required to provide space, computers, and advisers for graduate students. Faculty expressed an appreciation for the strong support provided to Biometry by the administration from the Head, Deans, and Vice Chancellor. The team however, notes several management concerns including: the lack of faculty meetings; inadequate communications among Head, faculty, and graduate students; the need for continual curriculum improvement; and the lack of sufficient office and laboratory space. Concern is also raised about the potential over-commitment to international consulting at the expense of performing departmental functions. The team is reluctant to recommend the immediate initiation of a PhD program in the Biometry Department. Establishment of a successful master\u27s program before pursuing the doctoral program appears prudent. The merger of the two UNL statistical groups into one department would position UNL for a strong PhD program

    Security of Ubiquitous Computing Systems

    Get PDF
    The chapters in this open access book arise out of the EU Cost Action project Cryptacus, the objective of which was to improve and adapt existent cryptanalysis methodologies and tools to the ubiquitous computing framework. The cryptanalysis implemented lies along four axes: cryptographic models, cryptanalysis of building blocks, hardware and software security engineering, and security assessment of real-world systems. The authors are top-class researchers in security and cryptography, and the contributions are of value to researchers and practitioners in these domains. This book is open access under a CC BY license

    Adaptive multi-classifier systems for face re-identification applications

    Get PDF
    In video surveillance, decision support systems rely more and more on face recognition (FR) to rapidly determine if facial regions captured over a network of cameras correspond to individuals of interest. Systems for FR in video surveillance are applied in a range of scenarios, for instance in watchlist screening, face re-identification, and search and retrieval. The focus of this Thesis is video-to-video FR, as found in face re-identification applications, where facial models are designed on reference data, and update is archived on operational captures from video streams. Several challenges emerge from the task of recognizing individuals of interest from faces captured with video cameras. Most notably, it is often assumed that the facial appearance of target individuals do not change over time, and the proportions of faces captured for target and non-target individuals are balanced, known a priori and remain fixed. However, faces captured during operations vary due to several factors, including illumination, blur, resolution, pose expression, and camera interoperability. In addition, facial models used matching are commonly not representative since they are designed a priori, with a limited amount of reference samples that are collected and labeled at a high cost. Finally, the proportions of target and non-target individuals continuously change during operations. In literature, adaptive multiple classifier systems (MCSs) have been successfully applied to video-to-video FR, where the facial model for each target individual is designed using an ensemble of 2-class classifiers (trained using target vs. non-target reference samples). Recent approaches employ ensembles of 2-class Fuzzy ARTMAP classifiers, with a DPSO strategy to generate a pool of classifiers with optimized hyperparameters, and Boolean combination to merge their responses in the ROC space. Besides, the skew-sensitive ensembles were recently proposed to adapt the fusion function of an ensemble according to class imbalance measured on operational data. These active approaches estimate target vs. non-target proportions periodically during operations distance, and the fusion of classifier ensembles are adapted to such imbalance. Finally, face tracking can be used to regroup the system responses linked to a facial trajectory (facial captures from a single person in the scene) for robust spatio-temporal recognition, and to update facial models over time using operational data. In this Thesis, new techniques are proposed to adapt the facial models for individuals enrolled to a video-to-video FR system. Trajectory-based self-updating is proposed to update the system, considering gradual and abrupt changes in the classification environment. Then, skew-sensitive ensembles are proposed to adapt the system to the operational imbalance. In Chapter 2, an adaptive framework is proposed for partially-supervised learning of facial models over time based on facial trajectories. During operations, information from a face tracker and individual-specific ensembles is integrated for robust spatio-temporal recognition and for self-update of facial models. The tracker defines a facial trajectory for each individual in video. Recognition of a target individual is done if the positive predictions accumulated along a trajectory surpass a detection threshold for an ensemble. If the accumulated positive predictions surpass a higher update threshold, then all target face samples from the trajectory are combined with non-target samples (selected from the cohort and universal models) to update the corresponding facial model. A learn-and-combine strategy is employed to avoid knowledge corruption during self-update of ensembles. In addition, a memory management strategy based on Kullback-Leibler divergence is proposed to rank and select the most relevant target and non-target reference samples to be stored in memory as the ensembles evolves. The proposed system was validated with synthetic data and real videos from Face in Action dataset, emulating a passport checking scenario. Initially, enrollment trajectories were used for supervised learning of ensembles, and videos from three capture sessions were presented to the system for FR and self-update. Transaction-level analysis shows that the proposed approach outperforms baseline systems that do not adapt to new trajectories, and provides comparable performance to ideal systems that adapt to all relevant target trajectories, through supervised learning. Subject-level analysis reveals the existence of individuals for which self-updated ensembles provide a considerable benefit. Trajectory-level analysis indicates that the proposed system allows for robust spatio-temporal video-to-video FR. In Chapter 3, an extension and a particular implementation of the ensemble-based system for spatio-temporal FR is proposed, and is characterized in scenarios with gradual and abrupt changes in the classification environment. Transaction-level results show that the proposed system allows to increase AUC accuracy by about 3% in scenarios with abrupt changes, and by about 5% in scenarios with gradual changes. Subject-based analysis reveals the difficulties of FR with different poses, affecting more significantly the lamb- and goat-like individuals. Compared to reference spatio-temporal fusion approaches, the proposed accumulation scheme produces the highest discrimination. In Chapter 4, adaptive skew-sensitive ensembles are proposed to combine classifiers trained by selecting data with varying levels of imbalance and complexity, to sustain a high level the performance for video-to-video FR. During operations, the level of imbalance is periodically estimated from the input trajectories using the HDx quantification method, and pre-computed histogram representations of imbalanced data distributions. Ensemble scores are accumulated of trajectories for robust skew-sensitive spatio-temporal recognition. Results on synthetic data show that adapting the fusion function with the proposed approach can significantly improve performance. Results on real data show that the proposed method can outperform reference techniques in imbalanced video surveillance environments
    corecore