14 research outputs found

    Speech Modeling and Robust Estimation for Diagnosis of Parkinson’s Disease

    Get PDF

    Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods

    Get PDF
    Speech signals radiated in confined spaces are subject to reverberation due to reflections of surrounding walls and obstacles. Reverberation leads to severe degradation of speech intelligibility and can be prohibitive for applications where speech is digitally recorded, such as audio conferencing or hearing aids. Dereverberation of speech is therefore an important field in speech enhancement. Driven by consumer demand, blind speech dereverberation has become a popular field in the research community and has led to many interesting approaches in the literature. However, most existing methods are dictated by their underlying models and hence suffer from assumptions that constrain the approaches to specific subproblems of blind speech dereverberation. For example, many approaches limit the dereverberation to voiced speech sounds, leading to poor results for unvoiced speech. Few approaches tackle single-sensor blind speech dereverberation, and only a very limited subset allows for dereverberation of speech from moving speakers. Therefore, the aim of this dissertation is the development of a flexible and extendible framework for blind speech dereverberation accommodating different speech sound types, single- or multiple sensor as well as stationary and moving speakers. Bayesian methods benefit from – rather than being dictated by – appropriate model choices. Therefore, the problem of blind speech dereverberation is considered from a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach accommodating a multitude of models for the speech production mechanism and room transfer function is consequently derived. In this approach both the anechoic source signal and reverberant channel are estimated using their optimal estimators by means of Rao-Blackwellisation of the state-space of unknown variables. The remaining model parameters are estimated using sequential importance resampling. The proposed approach is implemented for two different speech production models for stationary speakers, demonstrating substantial reduction in reverberation for both unvoiced and voiced speech sounds. Furthermore, the channel model is extended to facilitate blind dereverberation of speech from moving speakers. Due to the structure of measurement model, single- as well as multi-microphone processing is facilitated, accommodating physically constrained scenarios where only a single sensor can be used as well as allowing for the exploitation of spatial diversity in scenarios where the physical size of microphone arrays is of no concern. This dissertation is concluded with a survey of possible directions for future research, including the use of switching Markov source models, joint target tracking and enhancement, as well as an extension to subband processing for improved computational efficiency

    Multichannel source separation and tracking with phase differences by random sample consensus

    Get PDF
    Blind audio source separation (BASS) is a fascinating problem that has been tackled from many different angles. The use case of interest in this thesis is that of multiple moving and simultaneously-active speakers in a reverberant room. This is a common situation, for example, in social gatherings. We human beings have the remarkable ability to focus attention on a particular speaker while effectively ignoring the rest. This is referred to as the ``cocktail party effect'' and has been the holy grail of source separation for many decades. Replicating this feat in real-time with a machine is the goal of BASS. Single-channel methods attempt to identify the individual speakers from a single recording. However, with the advent of hand-held consumer electronics, techniques based on microphone array processing are becoming increasingly popular. Multichannel methods record a sound field from various locations to incorporate spatial information. If the speakers move over time, we need an algorithm capable of tracking their positions in the room. For compact arrays with 1-10 cm of separation between the microphones, this can be accomplished by applying a temporal filter on estimates of the directions-of-arrival (DOA) of the speakers. In this thesis, we review recent work on BSS with inter-channel phase difference (IPD) features and provide extensions to the case of moving speakers. It is shown that IPD features compose a noisy circular-linear dataset. This data is clustered with the RANdom SAmple Consensus (RANSAC) algorithm in the presence of strong reverberation to simultaneously localize and separate speakers. The remarkable performance of RANSAC is due to its natural tendency to reject outliers. To handle the case of non-stationary speakers, a factorial wrapped Kalman filter (FWKF) and a factorial von Mises-Fisher particle filter (FvMFPF) are proposed that track source DOAs directly on the unit circle and unit sphere, respectively. These algorithms combine directional statistics, Bayesian filtering theory, and probabilistic data association techniques to track the speakers with mixtures of directional distributions

    Tracking Rhythmicity in Biomedical Signals using Sequential Monte Carlo methods

    Get PDF
    Cyclical patterns are common in signals that originate from natural systems such as the human body and man-made machinery. Often these cyclical patterns are not perfectly periodic. In that case, the signals are called pseudo-periodic or quasi-periodic and can be modeled as a sum of time-varying sinusoids, whose frequencies, phases, and amplitudes change slowly over time. Each time-varying sinusoid represents an individual rhythmical component, called a partial, that can be characterized by three parameters: frequency, phase, and amplitude. Quasi-periodic signals often contain multiple partials that are harmonically related. In that case, the frequencies of other partials become exact integer multiples of that of the slowest partial. These signals are referred to as multi-harmonic signals. Examples of such signals are electrocardiogram (ECG), arterial blood pressure (ABP), and human voice. A Markov process is a mathematical model for a random system whose future and past states are independent conditional on the present state. Multi-harmonic signals can be modeled as a stochastic process with the Markov property. The Markovian representation of multi-harmonic signals enables us to use state-space tracking methods to continuously estimate the frequencies, phases, and amplitudes of the partials. Several research groups have proposed various signal analysis methods such as hidden Markov Models (HMM), short time Fourier transform (STFT), and Wigner-Ville distribution to solve this problem. Recently, a few groups of researchers have proposed Monte Carlo methods which estimate the posterior distribution of the fundamental frequency in multi-harmonic signals sequentially. However, multi-harmonic tracking is more challenging than single-frequency tracking, though the reason for this has not been well understood. The main objectives of this dissertation are to elucidate the fundamental obstacles to multi-harmonic tracking and to develop a reliable multi-harmonic tracker that can track cyclical patterns in multi-harmonic signals

    Colocated multiple-input multiple-output radars for smart mobility

    Get PDF
    In recent years, radars have been used in many applications such as precision agriculture and advanced driver assistant systems. Optimal techniques for the estimation of the number of targets and of their coordinates require solving multidimensional optimization problems entailing huge computational efforts. This has motivated the development of sub-optimal estimation techniques able to achieve good accuracy at a manageable computational cost. Another technical issue in advanced driver assistant systems is the tracking of multiple targets. Even if various filtering techniques have been developed, new efficient and robust algorithms for target tracking can be devised exploiting a probabilistic approach, based on the use of the factor graph and the sum-product algorithm. The two contributions provided by this dissertation are the investigation of the filtering and smoothing problems from a factor graph perspective and the development of efficient algorithms for two and three-dimensional radar imaging. Concerning the first contribution, a new factor graph for filtering is derived and the sum-product rule is applied to this graphical model; this allows to interpret known algorithms and to develop new filtering techniques. Then, a general method, based on graphical modelling, is proposed to derive filtering algorithms that involve a network of interconnected Bayesian filters. Finally, the proposed graphical approach is exploited to devise a new smoothing algorithm. Numerical results for dynamic systems evidence that our algorithms can achieve a better complexity-accuracy tradeoff and tracking capability than other techniques in the literature. Regarding radar imaging, various algorithms are developed for frequency modulated continuous wave radars; these algorithms rely on novel and efficient methods for the detection and estimation of multiple superimposed tones in noise. The accuracy achieved in the presence of multiple closely spaced targets is assessed on the basis of both synthetically generated data and of the measurements acquired through two commercial multiple-input multiple-output radars

    Bayesian framework for multiple acoustic source tracking

    Get PDF
    Acoustic source (speaker) tracking in the room environment plays an important role in many speech and audio applications such as multimedia, hearing aids and hands-free speech communication and teleconferencing systems; the position information can be fed into a higher processing stage for high-quality speech acquisition, enhancement of a specific speech signal in the presence of other competing talkers, or keeping a camera focused on the speaker in a video-conferencing scenario. Most of existing systems focus on the single source tracking problem, which assumes one and only one source is active all the time, and the state to be estimated is simply the source position. However, in practical scenarios, multiple speakers may be simultaneously active, and the tracking algorithm should be able to localise each individual source and estimate the number of sources. This thesis contains three contributions towards solutions to multiple acoustic source tracking in a moderate noisy and reverberant environment. The first contribution of this thesis is proposing a time-delay of arrival (TDOA) estimation approach for multiple sources. Although the phase transform (PHAT) weighted generalised cross-correlation (GCC) method has been employed to extract the TDOAs of multiple sources, it is primarily used for a single source scenario and its performance for multiple TDOA estimation has not been comprehensively studied. The proposed approach combines the degenerate unmixing estimation technique (DUET) and GCC method. Since the speech mixtures are assumed window-disjoint orthogonal (WDO) in the time-frequency domain, the spectrograms can be separated by employing DUET, and the GCC method can then be applied to the spectrogram of each individual source. The probabilities of detection and false alarm are also proposed to evaluate the TDOA estimation performance under a series of experimental parameters. Next, considering multiple acoustic sources may appear nonconcurrently, an extended Kalman particle filtering (EKPF) is developed for a special multiple acoustic source tracking problem, namely “nonconcurrent multiple acoustic tracking (NMAT)”. The extended Kalman filter (EKF) is used to approximate the optimum weights, and the subsequent particle filtering (PF) naturally takes the previous position estimates as well as the current TDOA measurements into account. The proposed approach is thus able to lock on the sharp change of the source position quickly, and avoid the tracking-lag in the general sequential importance resampling (SIR) PF. Finally, these investigations are extended into an approach to track the multiple unknown and time-varying number of acoustic sources. The DUET-GCC method is used to obtain the TDOA measurements for multiple sources and a random finite set (RFS) based Rao-blackwellised PF is employed and modified to track the sources. Each particle has a RFS form encapsulating the states of all sources and is capable of addressing source dynamics: source survival, new source appearance and source deactivation. A data association variable is defined to depict the source dynamic and its relation to the measurements. The Rao-blackwellisation step is used to decompose the state: the source positions are marginalised by using an EKF, and only the data association variable needs to be handled by a PF. The performances of all the proposed approaches are extensively studied under different noisy and reverberant environments, and are favorably comparable with the existing tracking techniques

    Online Audio-Visual Multi-Source Tracking and Separation: A Labeled Random Finite Set Approach

    Get PDF
    The dissertation proposes an online solution for separating an unknown and time-varying number of moving sources using audio and visual data. The random finite set framework is used for the modeling and fusion of audio and visual data. This enables an online tracking algorithm to estimate the source positions and identities for each time point. With this information, a set of beamformers can be designed to separate each desired source and suppress the interfering sources

    Sound Event Localization, Detection, and Tracking by Deep Neural Networks

    Get PDF
    In this thesis, we present novel sound representations and classification methods for the task of sound event localization, detection, and tracking (SELDT). The human auditory system has evolved to localize multiple sound events, recognize and further track their motion individually in an acoustic environment. This ability of humans makes them context-aware and enables them to interact with their surroundings naturally. Developing similar methods for machines will provide an automatic description of social and human activities around them and enable machines to be context-aware similar to humans. Such methods can be employed to assist the hearing impaired to visualize sounds, for robot navigation, and to monitor biodiversity, the home, and cities. A real-life acoustic scene is complex in nature, with multiple sound events that are temporally and spatially overlapping, including stationary and moving events with varying angular velocities. Additionally, each individual sound event class, for example, a car horn can have a lot of variabilities, i.e., different cars have different horns, and within the same model of the car, the duration and the temporal structure of the horn sound is driver dependent. Performing SELDT in such overlapping and dynamic sound scenes while being robust is challenging for machines. Hence we propose to investigate the SELDT task in this thesis and use a data-driven approach using deep neural networks (DNNs). The sound event detection (SED) task requires the detection of onset and offset time for individual sound events and their corresponding labels. In this regard, we propose to use spatial and perceptual features extracted from multichannel audio for SED using two different DNNs, recurrent neural networks (RNNs) and convolutional recurrent neural networks (CRNNs). We show that using multichannel audio features improves the SED performance for overlapping sound events in comparison to traditional single-channel audio features. The proposed novel features and methods produced state-of-the-art performance for the real-life SED task and won the IEEE AASP DCASE challenge consecutively in 2016 and 2017. Sound event localization is the task of spatially locating the position of individual sound events. Traditionally, this has been approached using parametric methods. In this thesis, we propose a CRNN for detecting the azimuth and elevation angles of multiple temporally overlapping sound events. This is the first DNN-based method performing localization in complete azimuth and elevation space. In comparison to parametric methods which require the information of the number of active sources, the proposed method learns this information directly from the input data and estimates their respective spatial locations. Further, the proposed CRNN is shown to be more robust than parametric methods in reverberant scenarios. Finally, the detection and localization tasks are performed jointly using a CRNN. This method additionally tracks the spatial location with time, thus producing the SELDT results. This is the first DNN-based SELDT method and is shown to perform equally with stand-alone baselines for SED, localization, and tracking. The proposed SELDT method is evaluated on nine datasets that represent anechoic and reverberant sound scenes, stationary and moving sources with varying velocities, a different number of overlapping sound events and different microphone array formats. The results show that the SELDT method can track multiple overlapping sound events that are both spatially stationary and moving

    Techniques d’Estimation de Canal et de DĂ©calage de FrĂ©quence Porteuse pour SystĂšmes Sans-fil Multiporteuses en Liaison Montante

    Get PDF
    Multicarrier modulation is the common feature of high-data rate mobile wireless systems. In that case, two phenomena disturb the symbol detection. Firstly, due to the relative transmitter-receiver motion and a difference between the local oscillator (LO) frequency at the transmitter and the receiver, a carrier frequency offset (CFO) affects the received signal. This leads to an intercarrier interference (ICI). Secondly, several versions of the transmitted signal are received due to the wireless propagation channel. These unwanted phenomena must be taken into account when designing a receiver. As estimating the multipath channel and the CFO is essential, this PhD deals with several CFO and channel estimation methods based on optimal filtering. Firstly, as the estimation issue is nonlinear, we suggest using the extended Kalman filter (EKF). It is based on a local linearization of the equations around the last state estimate. However, this approach requires a linearization based on calculations of Jacobians and Hessians matrices and may not be a sufficient description of the nonlinearity. For these reasons, we can consider the sigma-point Kalman filter (SPKF), namely the unscented Kalman Filter (UKF) and the central difference Kalman filter (CDKF). The UKF is based on the unscented transformation whereas the CDKF is based on the second order Sterling polynomial interpolation formula. Nevertheless, the above methods require an exact and accurate a priori system model as well as perfect knowledge of the additive measurementnoise statistics. Therefore, we propose to use the H∞ filtering, which is known to be more robust to uncertainties than Kalman filtering. As the state-space representation of the system is non-linear, we first evaluate the “extended H∞ filter”, which is based on a linearization of the state-space equations like the EKF. As an alternative, the “unscented H∞ filter”, which has been recently proposed in the literature, is implemented by embedding the unscented transformation into the “extended H∞ filter” and carrying out the filtering by using the statistical linear error propagation approach.Multicarrier modulation is the common feature of high-data rate mobile wireless systems. In that case, two phenomena disturb the symbol detection. Firstly, due to the relative transmitter-receiver motion and a difference between the local oscillator (LO) frequency at the transmitter and the receiver, a carrier frequency offset (CFO) affects the received signal. This leads to an intercarrier interference (ICI). Secondly, several versions of the transmitted signal are received due to the wireless propagation channel. These unwanted phenomena must be taken into account when designing a receiver. As estimating the multipath channel and the CFO is essential, this PhD deals with several CFO and channel estimation methods based on optimal filtering. Firstly, as the estimation issue is nonlinear, we suggest using the extended Kalman filter (EKF). It is based on a local linearization of the equations around the last state estimate. However, this approach requires a linearization based on calculations of Jacobians and Hessians matrices and may not be a sufficient description of the nonlinearity. For these reasons, we can consider the sigma-point Kalman filter (SPKF), namely the unscented Kalman Filter (UKF) and the central difference Kalman filter (CDKF). The UKF is based on the unscented transformation whereas the CDKF is based on the second order Sterling polynomial interpolation formula. Nevertheless, the above methods require an exact and accurate a priori system model as well as perfect knowledge of the additive measurementnoise statistics. Therefore, we propose to use the H∞ filtering, which is known to be more robust to uncertainties than Kalman filtering. As the state-space representation of the system is non-linear, we first evaluate the “extended H∞ filter”, which is based on a linearization of the state-space equations like the EKF. As an alternative, the “unscented H∞ filter”, which has been recently proposed in the literature, is implemented by embedding the unscented transformation into the “extended H∞ filter” and carrying out the filtering by using the statistical linear error propagation approach

    Computational intelligence approaches to robotics, automation, and control [Volume guest editors]

    Get PDF
    No abstract available
    corecore