113 research outputs found

    Speech recognition in noise using weighted matching algorithms

    Get PDF

    Video enhancement : content classification and model selection

    Get PDF
    The purpose of video enhancement is to improve the subjective picture quality. The field of video enhancement includes a broad category of research topics, such as removing noise in the video, highlighting some specified features and improving the appearance or visibility of the video content. The common difficulty in this field is how to make images or videos more beautiful, or subjectively better. Traditional approaches involve lots of iterations between subjective assessment experiments and redesigns of algorithm improvements, which are very time consuming. Researchers have attempted to design a video quality metric to replace the subjective assessment, but so far it is not successful. As a way to avoid heuristics in the enhancement algorithm design, least mean square methods have received considerable attention. They can optimize filter coefficients automatically by minimizing the difference between processed videos and desired versions through a training. However, these methods are only optimal on average but not locally. To solve the problem, one can apply the least mean square optimization for individual categories that are classified by local image content. The most interesting example is Kondo’s concept of local content adaptivity for image interpolation, which we found could be generalized into an ideal framework for content adaptive video processing. We identify two parts in the concept, content classification and adaptive processing. By exploring new classifiers for the content classification and new models for the adaptive processing, we have generalized a framework for more enhancement applications. For the part of content classification, new classifiers have been proposed to classify different image degradations such as coding artifacts and focal blur. For the coding artifact, a novel classifier has been proposed based on the combination of local structure and contrast, which does not require coding block grid detection. For the focal blur, we have proposed a novel local blur estimation method based on edges, which does not require edge orientation detection and shows more robust blur estimation. With these classifiers, the proposed framework has been extended to coding artifact robust enhancement and blur dependant enhancement. With the content adaptivity to more image features, the number of content classes can increase significantly. We show that it is possible to reduce the number of classes without sacrificing much performance. For the part of model selection, we have introduced several nonlinear filters to the proposed framework. We have also proposed a new type of nonlinear filter, trained bilateral filter, which combines both advantages of the original bilateral filter and the least mean square optimization. With these nonlinear filters, the proposed framework show better performance than with linear filters. Furthermore, we have shown a proof-of-concept for a trained approach to obtain contrast enhancement by a supervised learning. The transfer curves are optimized based on the classification of global or local image content. It showed that it is possible to obtain the desired effect by learning from other computationally expensive enhancement algorithms or expert-tuned examples through the trained approach. Looking back, the thesis reveals a single versatile framework for video enhancement applications. It widens the application scope by including new content classifiers and new processing models and offers scalabilities with solutions to reduce the number of classes, which can greatly accelerate the algorithm design

    Interference mitigation techniques for wireless OFDM

    Get PDF
    Orthogonal Frequency Division Multiplexing (OFDM) is a promising multicarrier wireless system for transmission of high-rate data stream with spectral efficiency and fading immunity. Conventional OFDM system use efficient IFFT and FFT to multiplex the signals in parallel at the transmitter and receiver respectively. On the other hand, wavelet based OFDM system uses orthonormal wavelets which are derived from a multistage tree-structured wavelet family. The Fourier based and wavelet based OFDM systems are studied in this dissertation. Two types of QAM schemes, circular and square modulations are used to compare the performance in both OFDM systems. A new approach of determining exact BER for optimal circular QAM is proposed. In addition, the presence of narrowband interference (NBI) degrades the performance of OFDM systems. Thus, a mitigation technique is necessary to suppress NBI in an OFDM system. Recent mitigation techniques can be broadly categorized into frequency domain cancellation, receiver windowing and excision filtering. However, none of the techniques considers wavelet based OFDM. Therefore, an interference cancelation algorithm has been proposed to work for both OFDM platforms. The performance results of two OFDM schemes applicable to digital video broadcasting (DVB)-terrestrial system and under the effect of impulsive noise interference are also studied. BER performances are obtained in all results. It has been shown that wavelet based OFDM system has outperformed Fourier based OFDM system in many cases

    Deep Learning for Distant Speech Recognition

    Full text link
    Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201

    Multiresolution image models and estimation techniques

    Get PDF

    Robust Distributed Multi-Source Detection and Labeling in Wireless Acoustic Sensor Networks

    Get PDF
    The growing demand in complex signal processing methods associated with low-energy large scale wireless acoustic sensor networks (WASNs) urges the shift to a new information and communication technologies (ICT) paradigm. The emerging research perception aspires for an appealing wireless network communication where multiple heterogeneous devices with different interests can cooperate in various signal processing tasks (MDMT). Contributions in this doctoral thesis focus on distributed multi-source detection and labeling applied to audio enhancement scenarios pursuing an MDMT fashioned node-specific source-of-interest signal enhancement in WASNs. In fact, an accurate detection and labeling is a pre-requisite to pursue the MDMT paradigm where nodes in the WASN communicate effectively their sources-of-interest and, therefore, multiple signal processing tasks can be enhanced via cooperation. First, a novel framework based on a dominant source model in distributed WASNs for resolving the activity detection of multiple speech sources in a reverberant and noisy environment is introduced. A preliminary rank-one multiplicative non-negative independent component analysis (M-NICA) for unique dominant energy source extraction given associated node clusters is presented. Partitional algorithms that minimize the within-cluster mean absolute deviation (MAD) and weighted MAD objectives are proposed to determine the cluster membership of the unmixed energies, and thus establish a source specific voice activity recognition. In a second study, improving the energy signal separation to alleviate the multiple source activity discrimination task is targeted. Sparsity inducing penalties are enforced on iterative rank-one singular value decomposition layers to extract sparse right rotations. Then, sparse non-negative blind energy separation is realized using multiplicative updates. Hence, the multiple source detection problem is converted into a sparse non-negative source energy decorrelation. Sparsity tunes the supposedly non-active energy signatures to exactly zero-valued energies so that it is easier to identify active energies and an activity detector can be constructed in a straightforward manner. In a centralized scenario, the activity decision is controlled by a fusion center that delivers the binary source activity detection for every participating energy source. This strategy gives precise detection results for small source numbers. With a growing number of interfering sources, the distributed detection approach is more promising. Conjointly, a robust distributed energy separation algorithm for multiple competing sources is proposed. A robust and regularized tνMt_{\nu}M-estimation of the covariance matrix of the mixed energies is employed. This approach yields a simple activity decision using only the robustly unmixed energy signatures of the sources in the WASN. The performance of the robust activity detector is validated with a distributed adaptive node-specific signal estimation method for speech enhancement. The latter enhances the quality and intelligibility of the signal while exploiting the accurately estimated multi-source voice decision patterns. In contrast to the original M-NICA for source separation, the extracted binary activity patterns with the robust energy separation significantly improve the node-specific signal estimation. Due to the increased computational complexity caused by the additional step of energy signal separation, a new approach to solving the detection question of multi-device multi-source networks is presented. Stability selection for iterative extraction of robust right singular vectors is considered. The sub-sampling selection technique provides transparency in properly choosing the regularization variable in the Lasso optimization problem. In this way, the strongest sparse right singular vectors using a robust ℓ1\ell_1-norm and stability selection are the set of basis vectors that describe the input data efficiently. Active/non-active source classification is achieved based on a robust Mahalanobis classifier. For this, a robust MM-estimator of the covariance matrix in the Mahalanobis distance is utilized. Extensive evaluation in centralized and distributed settings is performed to assess the effectiveness of the proposed approach. Thus, overcoming the computationally demanding source separation scheme is possible via exploiting robust stability selection for sparse multi-energy feature extraction. With respect to the labeling problem of various sources in a WASN, a robust approach is introduced that exploits the direction-of-arrival of the impinging source signals. A short-time Fourier transform-based subspace method estimates the angles of locally stationary wide band signals using a uniform linear array. The median of angles estimated at every frequency bin is utilized to obtain the overall angle for each participating source. The features, in this case, exploit the similarity across devices in the particular frequency bins that produce reliable direction-of-arrival estimates for each source. Reliability is defined with respect to the median across frequencies. All source-specific frequency bands that contribute to correct estimated angles are selected. A feature vector is formed for every source at each device by storing the frequency bin indices that lie within the upper and lower interval of the median absolute deviation scale of the estimated angle. Labeling is accomplished by a distributed clustering of the extracted angle-based feature vectors using consensus averaging

    Adaptive Equalisation for Impulsive Noise Environments

    Get PDF
    This thesis addresses the problem of adaptive channel equalisation in environments where the interfering noise exhibits non–Gaussian behaviour due to impulsive phenomena. The family of alpha-stable distributions has proved to be a suitable and flexible tool for the modelling of signals with impulsive nature. However,non–Gaussian alpha–stable signals have infinite variance, and signal processing techniques based on second order moments are meaningless in such environments. In order to exploit the flexibility of the stable family and still take advantage of the existing signal processing tools, a novel framework for the integration of the stable model in a communications context is proposed, based on a finite dynamic range receiver. The performance of traditional signal processing algorithms designed under the Gaussian assumption may degrade seriously in impulsive environments. When this degradation cannot be tolerated, the traditional signal processing methods must be revisited and redesigned taking into account the non–Gaussian noise statistics. In this direction, the optimum feed–forward and decision feedback Bayesian symbol–by–symbol equalisers for stable noise environments are derived. Then, new analytical tools for the evaluation of systems in infinite variance environments are presented. For the centers estimation of the proposed Bayesian equaliser, a unified framework for a family of robust recursive linear estimation techniques is presented and the underlying relationships between them are identified. Furthermore, the direct clustering technique is studied and robust variants of the existing algorithms are proposed. A novel clustering algorithm is also derived based on robust location estimation. The problem of estimating the stable parameters has been addressed in the literature and a variety of algorithms can be found. Some of these algorithms are assessed in terms of efficiency, simplicity and performance and the most suitable is chosen for the equalisation problem. All the building components of an adaptive Bayesian equaliser are then put together and the performance of the equaliser is evaluated experimentally. The simulation results suggest that the proposed adaptive equaliser offers a significant performance benefit compared with a traditional equaliser, designed under the Gaussian assumption. The implementation of the proposed Bayesian equaliser is simple but the computational complexity can be unaffordable. However, this thesis proposes certain approximations which enable the computationally efficient implementation of the optimum equaliser with negligible loss in performance
    • …
    corecore