35 research outputs found

    Improving model-based convolutive blind source separation techniques via bootstrap

    Get PDF
    Blind source separation for underdetermined reverberant mixtures is often achieved by assuming a statistical model for cues of interest where the unknown parameters of the statistical model depend on hidden variables. Here, the expectation-maximization (EM) algorithm is employed to compute maximum-likelihood estimates of the unknown model parameters. A by-product of the EM algorithm is a time-frequency (T-F) mask which allows the estimation of the target source from the given mixture. In this paper, we propose the idea of bootstrap averaging to improve separation quality from mixtures recorded under reverberant conditions. Our experiments on real speech mixture signals show an increase in the signal-to-distortion ratio (SDR) over a stateof- the-art baseline algorithm, to our knowledge, currently, the best performing technique in this class of methods

    Bootstrap averaging for model-based source separation in reverberant conditions

    Get PDF
    Recently proposed model-based methods use timefrequency (T-F) masking for source separation, where the T-F masks are derived from various cues described by a frequency domain Gaussian Mixture Model (GMM). These methods work well for separating mixtures recorded in low-to-medium level of reverberation, however, their performance degrades as the level of reverberation is increased. We note that the relatively poor performance of these methods under reverberant conditions can be attributed to the high variance of the frequency-dependent GMM parameter estimates. To address this limitation, a novel bootstrap-based approach is proposed to improve the accuracy of expectation maximization (EM) estimates of a frequencydependent GMM based on an a priori chosen initialization scheme. It is shown how the proposed technique allows us to construct time-frequency masks which lead to improved model-based source separation for reverberant speech mixtures. Experiments and analysis are performed on speech mixtures formed using real room-recorded impulse responses

    Robust variational Bayesian clustering for underdetermined speech separation

    Get PDF
    The main focus of this thesis is the enhancement of the statistical framework employed for underdetermined T-F masking blind separation of speech. While humans are capable of extracting a speech signal of interest in the presence of other interference and noise; actual speech recognition systems and hearing aids cannot match this psychoacoustic ability. They perform well in noise and reverberant free environments but suffer in realistic environments. Time-frequency masking algorithms based on computational auditory scene analysis attempt to separate multiple sound sources from only two reverberant stereo mixtures. They essentially rely on the sparsity that binaural cues exhibit in the time-frequency domain to generate masks which extract individual sources from their corresponding spectrogram points to solve the problem of underdetermined convolutive speech separation. Statistically, this can be interpreted as a classical clustering problem. Due to analytical simplicity, a finite mixture of Gaussian distributions is commonly used in T-F masking algorithms for modelling interaural cues. Such a model is however sensitive to outliers, therefore, a robust probabilistic model based on the Student's t-distribution is first proposed to improve the robustness of the statistical framework. This heavy tailed distribution, as compared to the Gaussian distribution, can potentially better capture outlier values and thereby lead to more accurate probabilistic masks for source separation. This non-Gaussian approach is applied to the state-of the-art MESSL algorithm and comparative studies are undertaken to confirm the improved separation quality. A Bayesian clustering framework that can better model uncertainties in reverberant environments is then exploited to replace the conventional expectation-maximization (EM) algorithm within a maximum likelihood estimation (MLE) framework. A variational Bayesian (VB) approach is then applied to the MESSL algorithm to cluster interaural phase differences thereby avoiding the drawbacks of MLE; specifically the probable presence of singularities and experimental results confirm an improvement in the separation performance. Finally, the joint modelling of the interaural phase and level differences and the integration of their non-Gaussian modelling within a variational Bayesian framework, is proposed. This approach combines the advantages of the robust estimation provided by the Student's t-distribution and the robust clustering inherent in the Bayesian approach. In other words, this general framework avoids the difficulties associated with MLE and makes use of the heavy tailed Student's t-distribution to improve the estimation of the soft probabilistic masks at various reverberation times particularly for sources in close proximity. Through an extensive set of simulation studies which compares the proposed approach with other T-F masking algorithms under different scenarios, a significant improvement in terms of objective and subjective performance measures is achieved

    Online Audio-Visual Multi-Source Tracking and Separation: A Labeled Random Finite Set Approach

    Get PDF
    The dissertation proposes an online solution for separating an unknown and time-varying number of moving sources using audio and visual data. The random finite set framework is used for the modeling and fusion of audio and visual data. This enables an online tracking algorithm to estimate the source positions and identities for each time point. With this information, a set of beamformers can be designed to separate each desired source and suppress the interfering sources

    Machine Autonomy : Definition, Approaches, Challenges and Research Gaps

    Get PDF
    Postprin

    Efficient and Robust Methods for Audio and Video Signal Analysis

    Get PDF
    This thesis presents my research concerning audio and video signal processing and machine learning. Specifically, the topics of my research include computationally efficient classifier compounds, automatic speech recognition (ASR), music dereverberation, video cut point detection and video classification.Computational efficacy of information retrieval based on multiple measurement modalities has been considered in this thesis. Specifically, a cascade processing framework, including a training algorithm to set its parameters has been developed for combining multiple detectors or binary classifiers in computationally efficient way. The developed cascade processing framework has been applied on video information retrieval tasks of video cut point detection and video classification. The results in video classification, compared to others found in the literature, indicate that the developed framework is capable of both accurate and computationally efficient classification. The idea of cascade processing has been additionally adapted for the ASR task. A procedure for combining multiple speech state likelihood estimation methods within an ASR framework in cascaded manner has been developed. The results obtained clearly show that without impairing the transcription accuracy the computational load of ASR can be reduced using the cascaded speech state likelihood estimation process.Additionally, this thesis presents my work on noise robustness of ASR using a nonnegative matrix factorization (NMF) -based approach. Specifically, methods for transformation of sparse NMF-features into speech state likelihoods has been explored. The results reveal that learned transformations from NMF activations to speech state likelihoods provide better ASR transcription accuracy than dictionary label -based transformations. The results, compared to others in a noisy speech recognition -challenge show that NMF-based processing is an efficient strategy for noise robustness in ASR.The thesis also presents my work on audio signal enhancement, specifically, on removing the detrimental effect of reverberation from music audio. In the work, a linear prediction -based dereverberation algorithm, which has originally been developed for speech signal enhancement, was applied for music. The results obtained show that the algorithm performs well in conjunction with music signals and indicate that dynamic compression of music does not impair the dereverberation performance

    Improving Monitoring and Diagnosis for Process Control using Independent Component Analysis

    Get PDF
    Statistical Process Control (SPC) is the general field concerned with monitoring the operation and performance of systems. SPC consists of a collection of techniques for characterizing the operation of a system using a probability distribution consistent with the system\u27s inputs and outputs. Classical SPC monitors a single variable to characterize the operation of a single machine tool or process step using tools such as Shewart charts. The traditional approach works well for simple small to medium size processes. For more complex processes a number of multivariate SPC techniques have been developed in recent decades. These advanced methods suffer from several disadvantages compared to univariate techniques: they tend to be statistically less powerful, and they tend to complicate process diagnosis when a disturbance is detected. This research introduces a general method for simplifying multivariate process monitoring in such a manner as to allow the use of traditional SPC tools while facilitating process diagnosis. Latent variable representations of complex processes are developed which directly relate disturbances with process steps or segments. The method models disturbances in the process rather than the process itself. The basic tool used is Independent Component Analysis (ICA). The methodology is illustrated on the problem of monitoring Electrical Test (E-Test) data from a semiconductor manufacturing process. Development and production data from a working semiconductor plant are used to estimate a factor model that is then used to develop univariate control charts for particular types of process disturbances. Detection and false alarm rates for data with known disturbances are given. The charts correctly detect and classify all the disturbance cases with a very low false alarm rate. A secondary contribution is the introduction of a method for performing an ICA like analysis using possibilistic data instead of probabilistic data. This technique extends the general ICA framework to apply to a broader range of uncertainty types. Further development of this technique could lead to the capability to use extremely sparse data to estimate ICA process models

    Improving Maternal and Fetal Cardiac Monitoring Using Artificial Intelligence

    Get PDF
    Early diagnosis of possible risks in the physiological status of fetus and mother during pregnancy and delivery is critical and can reduce mortality and morbidity. For example, early detection of life-threatening congenital heart disease may increase survival rate and reduce morbidity while allowing parents to make informed decisions. To study cardiac function, a variety of signals are required to be collected. In practice, several heart monitoring methods, such as electrocardiogram (ECG) and photoplethysmography (PPG), are commonly performed. Although there are several methods for monitoring fetal and maternal health, research is currently underway to enhance the mobility, accuracy, automation, and noise resistance of these methods to be used extensively, even at home. Artificial Intelligence (AI) can help to design a precise and convenient monitoring system. To achieve the goals, the following objectives are defined in this research: The first step for a signal acquisition system is to obtain high-quality signals. As the first objective, a signal processing scheme is explored to improve the signal-to-noise ratio (SNR) of signals and extract the desired signal from a noisy one with negative SNR (i.e., power of noise is greater than signal). It is worth mentioning that ECG and PPG signals are sensitive to noise from a variety of sources, increasing the risk of misunderstanding and interfering with the diagnostic process. The noises typically arise from power line interference, white noise, electrode contact noise, muscle contraction, baseline wandering, instrument noise, motion artifacts, electrosurgical noise. Even a slight variation in the obtained ECG waveform can impair the understanding of the patient's heart condition and affect the treatment procedure. Recent solutions, such as adaptive and blind source separation (BSS) algorithms, still have drawbacks, such as the need for noise or desired signal model, tuning and calibration, and inefficiency when dealing with excessively noisy signals. Therefore, the final goal of this step is to develop a robust algorithm that can estimate noise, even when SNR is negative, using the BSS method and remove it based on an adaptive filter. The second objective is defined for monitoring maternal and fetal ECG. Previous methods that were non-invasive used maternal abdominal ECG (MECG) for extracting fetal ECG (FECG). These methods need to be calibrated to generalize well. In other words, for each new subject, a calibration with a trustable device is required, which makes it difficult and time-consuming. The calibration is also susceptible to errors. We explore deep learning (DL) models for domain mapping, such as Cycle-Consistent Adversarial Networks, to map MECG to fetal ECG (FECG) and vice versa. The advantages of the proposed DL method over state-of-the-art approaches, such as adaptive filters or blind source separation, are that the proposed method is generalized well on unseen subjects. Moreover, it does not need calibration and is not sensitive to the heart rate variability of mother and fetal; it can also handle low signal-to-noise ratio (SNR) conditions. Thirdly, AI-based system that can measure continuous systolic blood pressure (SBP) and diastolic blood pressure (DBP) with minimum electrode requirements is explored. The most common method of measuring blood pressure is using cuff-based equipment, which cannot monitor blood pressure continuously, requires calibration, and is difficult to use. Other solutions use a synchronized ECG and PPG combination, which is still inconvenient and challenging to synchronize. The proposed method overcomes those issues and only uses PPG signal, comparing to other solutions. Using only PPG for blood pressure is more convenient since it is only one electrode on the finger where its acquisition is more resilient against error due to movement. The fourth objective is to detect anomalies on FECG data. The requirement of thousands of manually annotated samples is a concern for state-of-the-art detection systems, especially for fetal ECG (FECG), where there are few publicly available FECG datasets annotated for each FECG beat. Therefore, we will utilize active learning and transfer-learning concept to train a FECG anomaly detection system with the least training samples and high accuracy. In this part, a model is trained for detecting ECG anomalies in adults. Later this model is trained to detect anomalies on FECG. We only select more influential samples from the training set for training, which leads to training with the least effort. Because of physician shortages and rural geography, pregnant women's ability to get prenatal care might be improved through remote monitoring, especially when access to prenatal care is limited. Increased compliance with prenatal treatment and linked care amongst various providers are two possible benefits of remote monitoring. If recorded signals are transmitted correctly, maternal and fetal remote monitoring can be effective. Therefore, the last objective is to design a compression algorithm that can compress signals (like ECG) with a higher ratio than state-of-the-art and perform decompression fast without distortion. The proposed compression is fast thanks to the time domain B-Spline approach, and compressed data can be used for visualization and monitoring without decompression owing to the B-spline properties. Moreover, the stochastic optimization is designed to retain the signal quality and does not distort signal for diagnosis purposes while having a high compression ratio. In summary, components for creating an end-to-end system for day-to-day maternal and fetal cardiac monitoring can be envisioned as a mix of all tasks listed above. PPG and ECG recorded from the mother can be denoised using deconvolution strategy. Then, compression can be employed for transmitting signal. The trained CycleGAN model can be used for extracting FECG from MECG. Then, trained model using active transfer learning can detect anomaly on both MECG and FECG. Simultaneously, maternal BP is retrieved from the PPG signal. This information can be used for monitoring the cardiac status of mother and fetus, and also can be used for filling reports such as partogram
    corecore