46 research outputs found

    Efficient Noise Suppression for Robust Speech Recognition

    Get PDF
    Electrical EngineeringThis thesis addresses the issues of single microphone based noise estimation technique for speech recognition in noise environments. A lot of researches have been performed on the environmental noise estimation, however most of them require voice activity detector (VAD) for accurate estimation of noise characteristics. I propose two approaches for efficient noise estimation without VAD. The first approach aims at improving the conventional quantile-based noise estimation (QBNE). I fostered the QBNE by adjusting the quantile level (QL) according to the relative amount of added noise to the target speech. Basically, we assign two different QLs, i.e., binary levels, according to the measured statistical moment of log scale power spectrum at each frequency. The second approach is applying dual mixture parametric model in computing likelihoods of speech and non-speech classes. I used dual Gaussian mixture model (GMM) and Rayleigh mixture model (RMM) for the likelihoods. From the assumption that speech is generally uncorrelated to the environmental noises, the noise power spectrum can be estimated by using each mixture model parameter of speech absence class. I compared the proposed methods with the conventional QBNE and minimum statistics based method on a simple speech recognition task in various signal-to-noise ratio (SNR) levels. Based on the experimental results, the proposed methods are shown to be superior to the conventional methods.ope

    System Design And Motion Artifact Removal Algorithm Implementation For Ambulatory Women Ecg Measurement System:e-Bra System

    Get PDF
    Cardio Vascular Disease (CVD) leads to sudden cardiac death due to irregular phenomenon of the cardiac signal by the abnormal case of blood vessel and cardiac structure. For last three decades, there is an enhanced interest in research for cardiac diseases.. As a result, the death rate by cardiac disease in men has been falling gradually compared with relatively increasing the death rate for women due to CVD. The main reason for this phenomenon is due to the lack of seriousness to female CVD and different symptoms of female CVD compared with the symptoms of male CVD. Usually, because the CVDs for women accompany with ordinary symptoms not attributable to the heart abnormality signal such as unusual fatigue, sleep disturbances, shortness of breath, anxiety, chest discomfort, and indigestion dyspepsia, most women CVD patients do not realize that these symptoms are actually related to the CVD symptoms. Therefore, periodic ECG signal observation is required not only for women who have been diagnosed with heart disease but also for persons who want to examine their heart activity. Electrocardiogram (ECG) is used to diagnose abnormality of heart. Among the medical checkup methods for CVDs, it is very an effective method for the diagnosis of cardiac disease and the early detection of heart abnormality to monitor ECG periodically. This dissertation proposes an effective ECG monitoring system for woman by attaching the system on woman\u27s brassiere by using augmented chest lead attachment method. The suggested system called E-Bra system in this dissertation consists of an ECG transmission system and a computer installed program called E-Bra pro in order to display and analyze the ECG transmitted from the transmission module. The ECG transmission module consists of three parts such as ECG physical signal detection part with 3 stage amplifier and two electrodes, data acquisition with AD converter, and data transmission part with GPRS (General Packet Radio Service) communication, and it has very compact size that is attachable at the bottom layer of a brassiere for women. However, the ECG signal measured from the transmission module includes not only pure ECG components information; P waves QRS complex, and T wave, but also a motion artifact component (MA) due to subject movements. The MA component is one of the reasons for misdiagnosis. Therefore, the main purpose of the E-Bra system is to provide a reliable ECG data set identical to the quality of an ECG data set collected in hospital. Unfortunately, removing MA is a big challenge because the frequency range of the MA is duplicated on the frequency range of the pure ECG components, P-QRS-T. In this dissertation, two motion artifact removal algorithms (MARAs) with adaptive filter structure and independent component analysis concept are suggested, and the performance of the two MARAs will be evaluated by correlation values and signal noise ratio (SNR) values

    Modelling the nonstationarity of speech in the maximum negentropy beamformer

    Get PDF
    State-of-the-art automatic speech recognition (ASR) systems can achieve very low word error rates (WERs) of below 5% on data recorded with headsets. However, in many situations such as ASR at meetings or in the car, far field microphones on the table, walls or devices such as laptops are preferable to microphones that have to be worn close to the user\u27s mouths. Unfortunately, the distance between speakers and microphones introduces significant noise and reverberation, and as a consequence the WERs of current ASR systems on this data tend to be unacceptably high (30-50% upwards). The use of a microphone array, i.e. several microphones, can alleviate the problem somewhat by performing spatial filtering: beamforming techniques combine the sensors\u27 output in a way that focuses the processing on a particular direction. Assuming that the signal of interest comes from a different direction than the noise, this can improve the signal quality and reduce the WER by filtering out sounds coming from non-relevant directions. Historically, array processing techniques developed from research on non-speech data, e.g. in the fields of sonar and radar, and as a consequence most techniques were not created to specifically address beamforming in the context of ASR. While this generality can be seen as an advantage in theory, it also means that these methods ignore characteristics which could be used to improve the process in a way that benefits ASR. An example of beamforming adapted to speech processing is the recently proposed maximum negentropy beamformer (MNB), which exploits the statistical characteristics of speech as follows. "Clean" headset speech differs from noisy or reverberant speech in its statistical distribution, which is much less Gaussian in the clean case. Since negentropy is a measure of non-Gaussianity, choosing beamformer weights that maximise the negentropy of the output leads to speech that is closer to clean speech in its distribution, and this in turn has been shown to lead to improved WERs [Kumatani et al., 2009]. In this thesis several refinements of the MNB algorithm are proposed and evaluated. Firstly, a number of modifications to the original MNB configuration are proposed based on theoretical or practical concerns. These changes concern the probability density function (pdf) used to model speech, the estimation of the pdf parameters, and the method of calculating the negentropy. Secondly, a further step is taken to reflect the characteristics of speech by introducing time-varying pdf parameters. The original MNB uses fixed estimates per utterance, which do not account for the nonstationarity of speech. Several time-dependent variance estimates are therefore proposed, beginning with a simple moving average window and including the HMM-MNB, which derives the variance estimate from a set of auxiliary hidden Markov models. All beamformer algorithms presented in this thesis are evaluated through far-field ASR experiments on the Multi-Channel Wall Street Journal Audio-Visual Corpus, a database of utterances captured with real far-field sensors, in a realistic acoustic environment, and spoken by real speakers. While the proposed methods do not lead to an improvement in ASR performance, a more efficient MNB algorithm is developed, and it is shown that comparable results can be achieved with significantly less data than all frames of the utterance, a result which is of particular relevance for real-time implementations.Automatische Spracherkennungssysteme können heutzutage sehr niedrige Wortfehlerraten (WER) unter 5% erreichen, wenn die Sprachdaten mit einem Headset oder anderem Nahbesprechungsmikrofon aufgezeichnet wurden. Allerdings hat das Tragen eines mundnahen Mikrofons in vielen Situationen, wie z.B. der Spracherkennung im Auto oder wĂ€hrend einer Besprechung, praktische Nachteile, und ein auf dem Tisch, an der Wand oder am Laptop befestigtes Mikrofon wĂ€re in dem Fall vorteilhaft. Bei einer grĂ¶ĂŸeren Distanz zwischen Mikrofon und Sprecher werden andererseits aber verstĂ€rkt HintergrundgerĂ€usche und Hall aufgenommen, wodurch die Wortfehlerraten hĂ€ufig in einen unakzeptablen Bereich von 30—50% und höher steigen. Ein Mikrofonarray, d.h. eine Gruppe von Mikrofonen, kann hierbei durch rĂ€umliches Filtern in gewissem Maße Abhilfe schaffen: sogenannte Beamforming-Methoden können die Daten der einzelnen Sensoren so kombinieren, dass der Fokus auf eine bestimmte Richtung gerichtet wird. Wenn nun ein Zielsignal aus einer anderen Richtung als die StörgerĂ€usche kommt, kann dieser Prozess die SignalqualitĂ€t erhöhen und WER-Werte reduzieren, indem die GerĂ€usche aus den nicht-relevanten Richtungen herausgefiltert werden. Da Beamforming-Techniken sich aus der Forschung an nicht-sprachlichen Daten wie Sonar und Radar entwickelt haben, sind die wenigsten Methoden in diesem Bereich speziell auf das Problem der Spracherkennung ausgerichtet. WĂ€hrend eine AnwendungsunabhĂ€ngigkeit von Vorteil sein kann, bedeutet sie aber auch, dass Eigenschaften der Spracherkennung ignoriert werden, die zur Verbesserung des Ergebnisses genutzt werden könnten. Ein Beispiel fĂŒr einen Beamforming-Algorithmus, der speziell fĂŒr die Verarbeitung von Sprache entwickelt wurde, ist der Maximum Negentropy Beamformer (MNB). Der MNB nutzt die Tatsache, dass "saubere" Sprache, die mit einem Nahbesprechungsmikrofon aufgenommen wurde, eine andere Wahrscheinlichkeitsverteilung aufweist als verrauschte oder verhallte Sprache: Die Verteilung sauberer Sprache unterscheidet sich von der Normalverteilung sehr viel stĂ€rker als die von fern aufgezeichneter Sprache. Der MNB wĂ€hlt Beamforming-Gewichte, die den Negentropy-Wert maximieren, und da Negentropy misst, wie sehr sich eine Verteilung von der Normalverteilung unterscheidet, Ă€hnelt die vom MNB produzierte Sprache statistisch gesehen sauberer Sprache, was zu verbesserten WER-Werten gefĂŒhrt hat [Kumatani et al., 2009]. Das Thema dieser Dissertation ist die Entwicklung und Evaluierung von verschiedenen Modifikationen des MNB. Erstens wird eine Anzahl von praktisch und theoretisch motivierten VerĂ€nderungen vorgeschlagen, die die Form der Wahrscheinlichkeitsverteilung zur Sprachmodellierung, die SchĂ€tzung der Parameter dieser Verteilung und die Berechnung der Negentropy-Werte betreffen. Zweitens wird ein weiterer Schritt zur BerĂŒcksichtigung der Eigenschaften von Sprache unternommen, indem die ZeitabhĂ€ngigkeit der Verteilungsparameter eingefĂŒhrt wird; im ursprĂŒnglichen MNB-Algorithmus sind diese fĂŒr eine Äußerung konstant, was im Gegensatz zur nicht-konstanten Eigenschaft von Sprache steht. Mehrere zeitabhĂ€ngige Varianz-SchĂ€tzungmethoden werden beschrieben und evaluiert, von einem einfachen gleitenden Durchschnittswert bis zum komplexeren HMM-MNB, der die Varianz aus Hidden-Markov-Modellen ableitet. Alle Beamforming-Algorithmen, die in dieser Arbeit vorgestellt werden, werden durch Spracherkennungsexperimente mit dem Multi-Channel Wall Street Journal Audio-Visual Corpus evaluiert. Dieser Korpus wurde nicht durch Simulation erstellt, sondern besteht aus Äußerungen von Personen, die mit echten Sensoren in einer realistischen akustischen Umgebung aufgenommen wurden. Die Ergebnisse zeigen, dass mit den bisher entwickelten Methoden keine Verbesserung der Wortfehlerrate erreicht werden kann. Allerdings wurde ein effizienterer MNB-Algorithmus entwickelt, der vergleichbare Erkennungsraten mit deutlich weniger Sprachdaten erreichen kann, was vor allem fĂŒr eine Echtzeitimplementierung relevant ist

    Informed source extraction from a mixture of sources exploiting second order temporal structure

    Get PDF
    Extracting a specific signal from among man

    Enhancing brain-computer interfacing through advanced independent component analysis techniques

    No full text
    A Brain-computer interface (BCI) is a direct communication system between a brain and an external device in which messages or commands sent by an individual do not pass through the brain’s normal output pathways but is detected through brain signals. Some severe motor impairments, such as Amyothrophic Lateral Sclerosis, head trauma, spinal injuries and other diseases may cause the patients to lose their muscle control and become unable to communicate with the outside environment. Currently no effective cure or treatment has yet been found for these diseases. Therefore using a BCI system to rebuild the communication pathway becomes a possible alternative solution. Among different types of BCIs, an electroencephalogram (EEG) based BCI is becoming a popular system due to EEG’s fine temporal resolution, ease of use, portability and low set-up cost. However EEG’s susceptibility to noise is a major issue to develop a robust BCI. Signal processing techniques such as coherent averaging, filtering, FFT and AR modelling, etc. are used to reduce the noise and extract components of interest. However these methods process the data on the observed mixture domain which mixes components of interest and noise. Such a limitation means that extracted EEG signals possibly still contain the noise residue or coarsely that the removed noise also contains part of EEG signals embedded. Independent Component Analysis (ICA), a Blind Source Separation (BSS) technique, is able to extract relevant information within noisy signals and separate the fundamental sources into the independent components (ICs). The most common assumption of ICA method is that the source signals are unknown and statistically independent. Through this assumption, ICA is able to recover the source signals. Since the ICA concepts appeared in the fields of neural networks and signal processing in the 1980s, many ICA applications in telecommunications, biomedical data analysis, feature extraction, speech separation, time-series analysis and data mining have been reported in the literature. In this thesis several ICA techniques are proposed to optimize two major issues for BCI applications: reducing the recording time needed in order to speed up the signal processing and reducing the number of recording channels whilst improving the final classification performance or at least with it remaining the same as the current performance. These will make BCI a more practical prospect for everyday use. This thesis first defines BCI and the diverse BCI models based on different control patterns. After the general idea of ICA is introduced along with some modifications to ICA, several new ICA approaches are proposed. The practical work in this thesis starts with the preliminary analyses on the Southampton BCI pilot datasets starting with basic and then advanced signal processing techniques. The proposed ICA techniques are then presented using a multi-channel event related potential (ERP) based BCI. Next, the ICA algorithm is applied to a multi-channel spontaneous activity based BCI. The final ICA approach aims to examine the possibility of using ICA based on just one or a few channel recordings on an ERP based BCI. The novel ICA approaches for BCI systems presented in this thesis show that ICA is able to accurately and repeatedly extract the relevant information buried within noisy signals and the signal quality is enhanced so that even a simple classifier can achieve good classification accuracy. In the ERP based BCI application, after multichannel ICA the data just applied to eight averages/epochs can achieve 83.9% classification accuracy whilst the data by coherent averaging can reach only 32.3% accuracy. In the spontaneous activity based BCI, the use of the multi-channel ICA algorithm can effectively extract discriminatory information from two types of singletrial EEG data. The classification accuracy is improved by about 25%, on average, compared to the performance on the unpreprocessed data. The single channel ICA technique on the ERP based BCI produces much better results than results using the lowpass filter. Whereas the appropriate number of averages improves the signal to noise rate of P300 activities which helps to achieve a better classification. These advantages will lead to a reliable and practical BCI for use outside of the clinical laboratory

    Voice inactivity ranking for enhancement of speech on microphone arrays

    Full text link
    Motivated by the problem of improving the performance of speech enhancement algorithms in non-stationary acoustic environments with low SNR, a framework is proposed for identifying signal frames of noisy speech that are unlikely to contain voice activity. Such voice-inactive frames can then be incorporated into an adaptation strategy to improve the performance of existing speech enhancement algorithms. This adaptive approach is applicable to single-channel as well as multi-channel algorithms for noisy speech. In both cases, the adaptive versions of the enhancement algorithms are observed to improve SNR levels by 20dB, as indicated by PESQ and WER criteria. In advanced speech enhancement algorithms, it is often of interest to identify some regions of the signal that have a high likelihood of being noise only i.e. no speech present. This is in contrast to advanced speech recognition, speaker recognition, and pitch tracking algorithms in which we are interested in identifying all regions that have a high likelihood of containing speech, as well as regions that have a high likelihood of not containing speech. In other terms, this would mean minimizing the false positive and false negative rates, respectively. In the context of speech enhancement, the identification of some speech-absent regions prompts the minimization of false positives while setting an acceptable tolerance on false negatives, as determined by the performance of the enhancement algorithm. Typically, Voice Activity Detectors (VADs) are used for identifying speech absent regions for the application of speech enhancement. In recent years a myriad of Deep Neural Network (DNN) based approaches have been proposed to improve the performance of VADs at low SNR levels by training on combinations of speech and noise. Training on such an exhaustive dataset is combinatorically explosive. For this dissertation, we propose a voice inactivity ranking framework, where the identification of voice-inactive frames is performed using a machine learning (ML) approach that only uses clean speech utterances for training and is robust to high levels of noise. In the proposed framework, input frames of noisy speech are ranked by ‘voice inactivity score’ to acquire definitely speech inactive (DSI) frame-sequences. These DSI regions serve as a noise estimate and are adaptively used by the underlying speech enhancement algorithm to enhance speech from a speech mixture. The proposed voice-inactivity ranking framework was used to perform speech enhancement in single-channel and multi-channel systems. In the context of microphone arrays, the proposed framework was used to determine parameters for spatial filtering using adaptive beamformers. We achieved an average Word Error Rate (WER) improvement of 50% at SNR levels below 0dB compared to the noisy signal, which is 7±2.5% more than the framework where state-of-the-art VAD decision was used for spatial filtering. For monaural signals, we propose a multi-frame multiband spectral-subtraction (MF-MBSS) speech enhancement system utilizing the voice inactivity framework to compute and update the noise statistics on overlapping frequency bands. The proposed MF-MBSS not only achieved an average PESQ improvement of 16% with a maximum improvement of 56% when compared to the state-of-the-art Spectral Subtraction but also a 5 ± 1.5% improvement in the Word Error Rate (WER) of the spatially filtered output signal, in non-stationary acoustic environments

    Role of independent component analysis in intelligent ECG signal processing

    Get PDF
    The Electrocardiogram (ECG) reflects the activities and the attributes of the human heart and reveals very important hidden information in its structure. The information is extracted by means of ECG signal analysis to gain insights that are very crucial in explaining and identifying various pathological conditions. The feature extraction process can be accomplished directly by an expert through, visual inspection of ECGs printed on paper or displayed on a screen. However, the complexity and the time taken for the ECG signals to be visually inspected and manually analysed means that it‟s a very tedious task thus yielding limited descriptions. In addition, a manual ECG analysis is always prone to errors: human oversights. Moreover ECG signal processing has become a prevalent and effective tool for research and clinical practices. A typical computer based ECG analysis system includes a signal preprocessing, beats detection and feature extraction stages, followed by classification.Automatic identification of arrhythmias from the ECG is one important biomedical application of pattern recognition. This thesis focuses on ECG signal processing using Independent Component Analysis (ICA), which has received increasing attention as a signal conditioning and feature extraction technique for biomedical application. Long term ECG monitoring is often required to reliably identify the arrhythmia. Motion induced artefacts are particularly common in ambulatory and Holter recordings, which are difficult to remove with conventional filters due to their similarity to the shape of ectopic xiiibeats. Feature selection has always been an important step towards more accurate, reliable and speedy pattern recognition. Better feature spaces are also sought after in ECG pattern recognition applications. Two new algorithms are proposed, developed and validated in this thesis, one for removing non-trivial noises in ECGs using the ICA and the other deploys the ICA extracted features to improve recognition of arrhythmias. Firstly, independent component analysis has been studiedand found effective in this PhD project to separate out motion induced artefacts in ECGs, the independent component corresponding to noise is then removed from the ECG according to kurtosis and correlation measurement.The second algorithm has been developed for ECG feature extraction, in which the independent component analysis has been used to obtain a set of features, or basis functions of the ECG signals generated hypothetically by different parts of the heart during the normal and arrhythmic cardiac cycle. ECGs are then classified based on the basis functions along with other time domain features. The selection of the appropriate feature set for classifier has been found important for better performance and quicker response. Artificial neural networks based pattern recognition engines are used to perform final classification to measure the performance of ICA extracted features and effectiveness of the ICA based artefacts reduction algorithm.The motion artefacts are effectively removed from the ECG signal which is shown by beat detection on noisy and cleaned ECG signals after ICA processing. Using the ICA extracted feature sets classification of ECG arrhythmia into eight classes with fewer independent components and very high classification accuracy is achieved

    Text-independent speaker recognition

    Get PDF
    This research presents new text-independent speaker recognition system with multivariate tools such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA) embedded into the recognition system after the feature extraction step. The proposed approach evaluates the performance of such a recognition system when trained and used in clean and noisy environments. Additive white Gaussian noise and convolutive noise are added. Experiments were carried out to investigate the robust ability of PCA and ICA using the designed approach. The application of ICA improved the performance of the speaker recognition model when compared to PCA. Experimental results show that use of ICA enabled extraction of higher order statistics thereby capturing speaker dependent statistical cues in a text-independent recognition system. The results show that ICA has a better de-correlation and dimension reduction property than PCA. To simulate a multi environment system, we trained our model such that every time a new speech signal was read, it was contaminated with different types of noises and stored in the database. Results also show that ICA outperforms PCA under adverse environments. This is verified by computing recognition accuracy rates obtained when the designed system was tested for different train and test SNR conditions with additive white Gaussian noise and test delay conditions with echo effect
    corecore