301 research outputs found

    Efficient Multiband Algorithms for Blind Source Separation

    Get PDF
    The problem of blind separation refers to recovering original signals, called source signals, from the mixed signals, called observation signals, in a reverberant environment. The mixture is a function of a sequence of original speech signals mixed in a reverberant room. The objective is to separate mixed signals to obtain the original signals without degradation and without prior information of the features of the sources. The strategy used to achieve this objective is to use multiple bands that work at a lower rate, have less computational cost and a quicker convergence than the conventional scheme. Our motivation is the competitive results of unequal-passbands scheme applications, in terms of the convergence speed. The objective of this research is to improve unequal-passbands schemes by improving the speed of convergence and reducing the computational cost. The first proposed work is a novel maximally decimated unequal-passbands scheme.This scheme uses multiple bands that make it work at a reduced sampling rate, and low computational cost. An adaptation approach is derived with an adaptation step that improved the convergence speed. The performance of the proposed scheme was measured in different ways. First, the mean square errors of various bands are measured and the results are compared to a maximally decimated equal-passbands scheme, which is currently the best performing method. The results show that the proposed scheme has a faster convergence rate than the maximally decimated equal-passbands scheme. Second, when the scheme is tested for white and coloured inputs using a low number of bands, it does not yield good results; but when the number of bands is increased, the speed of convergence is enhanced. Third, the scheme is tested for quick changes. It is shown that the performance of the proposed scheme is similar to that of the equal-passbands scheme. Fourth, the scheme is also tested in a stationary state. The experimental results confirm the theoretical work. For more challenging scenarios, an unequal-passbands scheme with over-sampled decimation is proposed; the greater number of bands, the more efficient the separation. The results are compared to the currently best performing method. Second, an experimental comparison is made between the proposed multiband scheme and the conventional scheme. The results show that the convergence speed and the signal-to-interference ratio of the proposed scheme are higher than that of the conventional scheme, and the computation cost is lower than that of the conventional scheme

    Multimodal methods for blind source separation of audio sources

    Get PDF
    The enhancement of the performance of frequency domain convolutive blind source separation (FDCBSS) techniques when applied to the problem of separating audio sources recorded in a room environment is the focus of this thesis. This challenging application is termed the cocktail party problem and the ultimate aim would be to build a machine which matches the ability of a human being to solve this task. Human beings exploit both their eyes and their ears in solving this task and hence they adopt a multimodal approach, i.e. they exploit both audio and video modalities. New multimodal methods for blind source separation of audio sources are therefore proposed in this work as a step towards realizing such a machine. The geometry of the room environment is initially exploited to improve the separation performance of a FDCBSS algorithm. The positions of the human speakers are monitored by video cameras and this information is incorporated within the FDCBSS algorithm in the form of constraints added to the underlying cross-power spectral density matrix-based cost function which measures separation performance. [Continues.

    Subjective evaluation of auditory spatial imagery associated with decorrelated subwoofer signals

    Get PDF
    Presented at the 8th International Conference on Auditory Display (ICAD), Kyoto, Japan, July 2-5, 2002.Although only a single subwoofer is typically used in two-channel and multichannel stereophonic sound reproduction, the use of two subwoofers enables manipulation of low-frequency interaural crosscorrelation (IACC), and this manipulation is particularly effective in producing variation in auditory spatial imagery. In order to document this variation objectively, a series of listening experiments were executed using a set of stimuli generated at five correlation values and presented in two reproduction modes. Both modes used two subwoofers, but in one of the reproduction modes identical signals were applied to the two subwoofers. The results of both exploratory and confirmatory listening experiments showed that the range of variation in both perceived auditory source width (ASW) and perceived auditory source distance (ASD) is reduced when negatively correlated signals are not reproduced at low frequencies. Global dissimilarity judgments were made for this set of ten stimuli in an exploratory study designed to reveal the salient perceptual dimensions of the stimuli. A subsequent confirmatory study employed a two-alternative forced-choice task in order to determine how identifiably different the stimuli were with respect to the two perceptual attributes revealed in the exploratory study, those two attributes being ASW and ASD. The implications of these findings for loudspeaker-based spatial auditory display are discussed

    REMOTE SENSING IMAGE FUSION USING ICA AND OPTIMIZED WAVELET TRANSFORM

    Get PDF

    Sparse representation for audio noise removal using zero-zone quantizers

    Get PDF
    In zero zone quantization, bins around zero are quantized to a zero value. This kind of quantization can be applied on orthogonal transforms to remove the unwanted or redundant signal. Transforms reveal structures and properties of a signal and hence careful application of a zero zone over the transform coefficients leads to noise removal. In this thesis, such quantizers are applied over Discrete Fourier Transform and Karhunen Loeve Transform coefficients separately, and outputs compared. Further, the localization of the zero zones to certain frequencies leads to better performance in terms of noise removal. PEAQ (Perceptual Evaluation of Audio Quality) scores have been used to measure the objective quality of the denoised signal

    A Primal-Dual Proximal Algorithm for Sparse Template-Based Adaptive Filtering: Application to Seismic Multiple Removal

    Get PDF
    Unveiling meaningful geophysical information from seismic data requires to deal with both random and structured "noises". As their amplitude may be greater than signals of interest (primaries), additional prior information is especially important in performing efficient signal separation. We address here the problem of multiple reflections, caused by wave-field bouncing between layers. Since only approximate models of these phenomena are available, we propose a flexible framework for time-varying adaptive filtering of seismic signals, using sparse representations, based on inaccurate templates. We recast the joint estimation of adaptive filters and primaries in a new convex variational formulation. This approach allows us to incorporate plausible knowledge about noise statistics, data sparsity and slow filter variation in parsimony-promoting wavelet frames. The designed primal-dual algorithm solves a constrained minimization problem that alleviates standard regularization issues in finding hyperparameters. The approach demonstrates significantly good performance in low signal-to-noise ratio conditions, both for simulated and real field seismic data

    Enhancing brain-computer interfacing through advanced independent component analysis techniques

    No full text
    A Brain-computer interface (BCI) is a direct communication system between a brain and an external device in which messages or commands sent by an individual do not pass through the brain’s normal output pathways but is detected through brain signals. Some severe motor impairments, such as Amyothrophic Lateral Sclerosis, head trauma, spinal injuries and other diseases may cause the patients to lose their muscle control and become unable to communicate with the outside environment. Currently no effective cure or treatment has yet been found for these diseases. Therefore using a BCI system to rebuild the communication pathway becomes a possible alternative solution. Among different types of BCIs, an electroencephalogram (EEG) based BCI is becoming a popular system due to EEG’s fine temporal resolution, ease of use, portability and low set-up cost. However EEG’s susceptibility to noise is a major issue to develop a robust BCI. Signal processing techniques such as coherent averaging, filtering, FFT and AR modelling, etc. are used to reduce the noise and extract components of interest. However these methods process the data on the observed mixture domain which mixes components of interest and noise. Such a limitation means that extracted EEG signals possibly still contain the noise residue or coarsely that the removed noise also contains part of EEG signals embedded. Independent Component Analysis (ICA), a Blind Source Separation (BSS) technique, is able to extract relevant information within noisy signals and separate the fundamental sources into the independent components (ICs). The most common assumption of ICA method is that the source signals are unknown and statistically independent. Through this assumption, ICA is able to recover the source signals. Since the ICA concepts appeared in the fields of neural networks and signal processing in the 1980s, many ICA applications in telecommunications, biomedical data analysis, feature extraction, speech separation, time-series analysis and data mining have been reported in the literature. In this thesis several ICA techniques are proposed to optimize two major issues for BCI applications: reducing the recording time needed in order to speed up the signal processing and reducing the number of recording channels whilst improving the final classification performance or at least with it remaining the same as the current performance. These will make BCI a more practical prospect for everyday use. This thesis first defines BCI and the diverse BCI models based on different control patterns. After the general idea of ICA is introduced along with some modifications to ICA, several new ICA approaches are proposed. The practical work in this thesis starts with the preliminary analyses on the Southampton BCI pilot datasets starting with basic and then advanced signal processing techniques. The proposed ICA techniques are then presented using a multi-channel event related potential (ERP) based BCI. Next, the ICA algorithm is applied to a multi-channel spontaneous activity based BCI. The final ICA approach aims to examine the possibility of using ICA based on just one or a few channel recordings on an ERP based BCI. The novel ICA approaches for BCI systems presented in this thesis show that ICA is able to accurately and repeatedly extract the relevant information buried within noisy signals and the signal quality is enhanced so that even a simple classifier can achieve good classification accuracy. In the ERP based BCI application, after multichannel ICA the data just applied to eight averages/epochs can achieve 83.9% classification accuracy whilst the data by coherent averaging can reach only 32.3% accuracy. In the spontaneous activity based BCI, the use of the multi-channel ICA algorithm can effectively extract discriminatory information from two types of singletrial EEG data. The classification accuracy is improved by about 25%, on average, compared to the performance on the unpreprocessed data. The single channel ICA technique on the ERP based BCI produces much better results than results using the lowpass filter. Whereas the appropriate number of averages improves the signal to noise rate of P300 activities which helps to achieve a better classification. These advantages will lead to a reliable and practical BCI for use outside of the clinical laboratory

    HDR-ChipQA: No-Reference Quality Assessment on High Dynamic Range Videos

    Full text link
    We present a no-reference video quality model and algorithm that delivers standout performance for High Dynamic Range (HDR) videos, which we call HDR-ChipQA. HDR videos represent wider ranges of luminances, details, and colors than Standard Dynamic Range (SDR) videos. The growing adoption of HDR in massively scaled video networks has driven the need for video quality assessment (VQA) algorithms that better account for distortions on HDR content. In particular, standard VQA models may fail to capture conspicuous distortions at the extreme ends of the dynamic range, because the features that drive them may be dominated by distortions {that pervade the mid-ranges of the signal}. We introduce a new approach whereby a local expansive nonlinearity emphasizes distortions occurring at the higher and lower ends of the {local} luma range, allowing for the definition of additional quality-aware features that are computed along a separate path. These features are not HDR-specific, and also improve VQA on SDR video contents, albeit to a reduced degree. We show that this preprocessing step significantly boosts the power of distortion-sensitive natural video statistics (NVS) features when used to predict the quality of HDR content. In similar manner, we separately compute novel wide-gamut color features using the same nonlinear processing steps. We have found that our model significantly outperforms SDR VQA algorithms on the only publicly available, comprehensive HDR database, while also attaining state-of-the-art performance on SDR content

    Optophone design: optical-to-auditory vision substitution for the blind

    Get PDF
    An optophone is a device that turns light into sound for the benefit of blind people. The present project is intended to produce a general-purpose optophone to be worn on the head about the house and in the street, to give the wearer a detailed description in sound of the'scene he is facing. The device will therefore consist'of an'electronic camera, some signal-processing electronics, earphones`, and a battery. The two major problems are the derivation of (a) the most suitable mapping from images to sounds, and (b) an algorithm to perform the mapping in real'time on existing electronic components. This thesis concerns problem (a). Chapter 2 goes into the general scene-to-sound mapping problem in some detail'and presents the work of earlier investigators. Chapter 3 1- discusses the design of tests to evaluate the performance of candidate mappings. A theoretical performance test (TPT) is derived. Chapter 4 applies the TPT to the most obvious mapping, the cartesian piano transform. Chapter 5 applies the TPT to a mapping based on the cosine transform. Chapter 6 attempts to derive a mapping by principal component analysis, using the inaccuracies of human sight and hearing and the statistical properties of real scenes and sounds. Chapter 7 presents a complete scheme, implemented in software, for representing digitised colour scenes by audible digitised stereo sound. Chapter 8 tries to decide how'many numbers are required to specify a steady spectrum with no noticeable degradation. Chapter 9 looks'at a scheme designed to produce more natural-sounding sounds related to more meaningful portions of the scene. This scheme maps windows in the scene to steady spectral patterns of short duration, the location of the window being conveyed by simulated free-field listening. Chapter 10 gives detailed recommendations as to further work
    • …
    corecore