Search CORE

85 research outputs found

Board # 29 : A PATTERN RECOGNITION APPROACH TO SIGNAL TO NOISE RATIO ESTIMATION OF SPEECH

Author: Awolumate Peter Adeyemi
Bouaynaya Nidhal
Dahm Kevin
Nazari Rouzbeh
Ramachandran Ravi
Rudy Mitchell
Thayasivam Umashanger
Publication venue: Rowan Digital Works
Publication date: 24/06/2017
Field of study

A blind approach for estimating the signal to noise ratio (SNR) of a speech signal corrupted by additive noise is proposed. The method is based on a pattern recognition paradigm using various linear predictive based features, a vector quantizer classifier and estimation combination. Blind SNR estimation is very useful in biometric speaker identification systems in which a confidence metric is determined along with the speaker identity. The confidence metric is partially based on the mismatch between the training and testing conditions of the speaker identification system and SNR estimation is very important in evaluating the degree of this mismatch. The educational impact of this project is two-fold: 1. Undergraduate students are initiated into research/development by working on a team to achieve a software implementation of the SNR estimation system. The students will also evaluate the performance of the system by experimenting with different features and classifiers. Producing a paper in a refereed technical conference is the objective. 2. The students will also write a laboratory manual for a portion of this project to be run in a junior level signals and systems class and a senior level class on speech processing. Producing a paper in a refereed education conference is the objective. The learning outcomes for the students engaged in research and for the students doing the project in a class include: • Enhanced application of math skills • Enhanced software implementation skills • Enhanced interest in signal processing • Enhanced ability to analyze experimental results • Enhanced communication skills. The assessment instruments include: • Student surveys (target versus control group comparison that includes a statistical analysis) • Faculty tracking of student learning outcomes based on student work • Faculty evaluation of student work based on significant rubrics • A concept inventory tes

Rowan University

Using Gaussian Mixture Model and Partial Least Squares regression classifiers for robust speaker verification with various enhancement methods

Author: Edwards Joshua Scott
Publication venue: Rowan Digital Works
Publication date: 15/03/2017
Field of study

In the presence of environmental noise, speaker verification systems inevitably see a decrease in performance. This thesis proposes the use of two parallel classifiers with several enhancement methods in order to improve the performance of the speaker verification system when noisy speech signals are used for authentication. Both classifiers are shown to receive statistically significant performance gains when signal-to-noise ratio estimation, affine transforms, and score-level fusion of features are all applied. These enhancement methods are validated in a large range of test conditions, from perfectly clean speech all the way down to speech where the noise is equally as loud as the speaker. After each classifier has been tuned to their best configuration, they are also fused together in different ways. In the end, the performances of the two classifiers are compared to each other and to the performances of their fusions. The fusion method where the scores of the classifiers are added together is found to be the best method

Rowan University

Use of principal component analysis with linear predictive features in developing a blind SNR estimation system

Author: Marbach Matthew James
Publication venue: Rowan Digital Works
Publication date: 31/12/2006
Field of study

Signal-to-noise ratio is an important concept in electrical communications, as it is a measurable ratio between a given transmitted signal and the inherent background noise of a transmission channel. Currently signal-to-noise ratio testing is primarily performed by using an intrusive method of comparing a corrupted signal to the original signal and giving it a score based on the comparison. However, this technique is inefficient and often impossible for practical use because it requires the original signal for comparison. A speech signal\u27s characteristics and properties could be used to develop a non-intrusive method for determining SNR, or a method that does not require the presence of the original clean signal. In this thesis, several extracted features were investigated to determine whether a neural network trained with data from corrupt speech signals could accurately estimate the SNR of a speech signal. A MultiLayer Perceptron (MLP) was trained on extracted features for each decibel level from 0dB to 30dB, in an attempt to create \u27expert classifiers\u27 for each SNR level. This type of architecture would then have 31 independent classifiers operating together to accurately estimate the signal-to-noise ratio of an unknown speech signal. Principal component analysis was also implemented to reduce dimensionality and increase class discrimination. The performance of several neural network classifier structures is examined, as well as analyzing the overall results to determine the optimal feature for estimating signal-to-noise ratio of an unknown speech signal. Decision-level fusion was the final procedure which combined the outputs of several classifier systems in an effort to reduce the estimation error

Rowan University

New Stategies for Single-channel Speech Separation

Author: Mowlaee Beikzadehmahalen Pejman
Publication venue: Institut for Elektroniske Systemer, Aalborg Universitet
Publication date: 01/01/2010
Field of study

VBN

Copula-based Multimodal Data Fusion for Inference with Dependent Observations

Author: Zhang Shan
Publication venue: SURFACE at Syracuse University
Publication date: 20/12/2019
Field of study

Fusing heterogeneous data from multiple modalities for inference problems has been an attractive and important topic in recent years. There are several challenges in multi-modal fusion, such as data heterogeneity and data correlation. In this dissertation, we investigate inference problems with heterogeneous modalities by taking into account nonlinear cross-modal dependence. We apply copula based methodology to characterize this dependence. In distributed detection, the goal often is to minimize the probability of detection error at the fusion center (FC) based on a fixed number of observations collected by the sensors. We design optimal detection algorithms at the FC using a regular vine copula based fusion rule. Regular vine copula is an extremely flexible and powerful graphical model used to characterize complex dependence among multiple modalities. The proposed approaches are theoretically justified and are computationally efficient for sensor networks with a large number of sensors. With heterogeneous streaming data, the fusion methods applied for processing data streams should be fast enough to keep up with the high arrival rates of incoming data, and meanwhile provide solutions for inference problems (detection, classification, or estimation) with high accuracy. We propose a novel parallel platform, C-Storm (Copula-based Storm), by marrying copula-based dependence modeling for highly accurate inference and a highly-regarded parallel computing platform Storm for fast stream data processing. The efficacy of C-Storm is demonstrated. In this thesis, we consider not only decision level fusion but also fusion with heterogeneous high-level features. We investigate a supervised classification problem by fusing dependent high-level features extracted from multiple deep neural network (DNN) classifiers. We employ regular vine copula to fuse these high-level features. The efficacy of the combination of model-based method and deep learning is demonstrated. Besides fixed-sample-size (FSS) based inference problems, we study a distributed sequential detection problem with random-sample-size. The aim of the distributed sequential detection problem in a non-Bayesian framework is to minimize the average detection time while satisfying the pre-specified constraints on probabilities of false alarm and miss detection. We design local memory-less truncated sequential tests and propose a copula based sequential test at the FC. We show that by suitably designing the local thresholds and the truncation window, the local probabilities of false alarm and miss detection of the proposed local decision rules satisfy the pre-specified error probabilities. Also, we show the asymptotic optimality and time efficiency of the proposed distributed sequential scheme. In large scale sensors networks, we consider a collaborative distributed estimation problem with statistically dependent sensor observations, where there is no FC. To achieve greater sensor transmission and estimation efficiencies, we propose a two-step cluster-based collaborative distributed estimation scheme. In the first step, sensors form dependence driven clusters such that sensors in the same cluster are dependent while sensors from different clusters are independent, and perform copula-based maximum a posteriori probability (MAP) estimation via intra-cluster collaboration. In the second step, the estimates generated in the first step are shared via inter-cluster collaboration to reach an average consensus. The efficacy of the proposed scheme is justified

Syracuse University Research Facility and Collaborative Environment

Arabic digits speech recognition and speaker identification in noisy environment using a hybrid model of VQ and GMM

Author: Frikel Miloud
Ouisaadane Abdelkbir
Safi Said
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/08/2020
Field of study

This paper presents an automatic speaker identification and speech recognition for Arabic digits in noisy environment. In this work, the proposed system is able to identify the speaker after saving his voice in the database and adding noise. The mel frequency cepstral coefficients (MFCC) is the best approach used in building a program in the Matlab platform; also, the quantization is used for generating the codebooks. The Gaussian mixture modelling (GMM) algorithms are used to generate template, feature-matching purpose. In this paper, we have proposed a system based on MFCC-GMM and MFCC-VQ Approaches on the one hand and by using the Hybrid Approach MFCC-VQ-GMM on the other hand for speaker modeling. The White Gaussian noise is added to the clean speech at several signal-to-noise ratio (SNR) levels to test the system in a noisy environment. The proposed system gives good results in recognition rate

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

Medical imaging analysis with artificial neural networks

Author: Jiang J.
Ren Jinchang
Trundle P.
Publication venue: 'Elsevier BV'
Publication date: 01/12/2010
Field of study

Given that neural networks have been widely reported in the research community of medical imaging, we provide a focused literature survey on recent neural network developments in computer-aided diagnosis, medical image segmentation and edge detection towards visual content analysis, and medical image registration for its pre-processing and post-processing, with the aims of increasing awareness of how neural networks can be applied to these areas and to provide a foundation for further research and practical development. Representative techniques and algorithms are explained in detail to provide inspiring examples illustrating: (i) how a known neural network with fixed structure and training procedure could be applied to resolve a medical imaging problem; (ii) how medical images could be analysed, processed, and characterised by neural networks; and (iii) how neural networks could be expanded further to resolve problems relevant to medical imaging. In the concluding section, a highlight of comparisons among many neural network applications is included to provide a global view on computational intelligence with neural networks in medical imaging

University of Strathclyde Institutional Repository

Surrey Research Insight

Automatic Driver Fatigue Monitoring Using Hidden Markov Models and Bayesian Networks

Author: Rashwan Abdullah
Publication venue: 'University of Waterloo'
Publication date: 11/12/2013
Field of study

The automotive industry is growing bigger each year. The central concern for any automotive company is driver and passenger safety. Many automotive companies have developed driver assistance systems, to help the driver and to ensure driver safety. These systems include adaptive cruise control, lane departure warning, lane change assistance, collision avoidance, night vision, automatic parking, traffic sign recognition, and driver fatigue detection. In this thesis, we aim to build a driver fatigue detection system that advances the research in this area. Using vision in detecting driver fatigue is commonly the key part for driver fatigue detection systems. We have decided to investigate different direction. We examine the driver's voice, heart rate, and driving performance to assess fatigue level. The system consists of three main modules: the audio module, the heart rate and other signals module, and the Bayesian network module. The audio module analyzes an audio recording of a driver and tries to estimate the level of fatigue for the driver. A Voice Activity Detection (VAD) module is used to extract driver speech from the audio recording. Mel-Frequency Cepstrum Coefficients, (MFCC) features are extracted from the speech signal, and then Support Vector Machines (SVM) and Hidden Markov Models (HMM) classifiers are used to detect driver fatigue. Both classifiers are tuned for best performance, and the performance of both classifiers is reported and compared. The heart rate and other signals module uses heart rate, steering wheel position, and the positions of the accelerator, brake, and clutch pedals to detect the level of fatigue. These signals' sample rates are then adjusted to match, allowing simple features to be extracted from the signals, and SVM and HMM classifiers are used to detect fatigue level. The performance of both classifiers is reported and compared. Bayesian networks' abilities to capture dependencies and uncertainty make them a sound choice to perform the data fusion. Prior information (Day/Night driving and previous decision) is also incorporated into the network to improve the final decision. The accuracies of the audio and heart rate and other signals modules are used to calculate certain CPTs for the Bayesian network, while the rest of the CPTs are calculated subjectively. The inference queries are calculated using the variable elimination algorithm. For those time steps where the audio module decision is absent, a window is defined and the last decision within this window is used as a current decision. The performance of the system is assessed based on the average accuracy per second. A dataset was built to train and test the system. The experimental results show that the system is very promising. The performance of the system was assessed based on the average accuracy per second; the total accuracy of the system is 90.5%. The system design can be easily improved by easily integrating more modules into the Bayesian network

University of Waterloo's Institutional Repository

Classification Models for Symmetric Key Cryptosystem Identification

Author: Kant Shri
Publication venue: 'Defence Scientific Information and Documentation Centre'
Publication date: 23/01/2012
Field of study

The present paper deals with the basic principle and theory behind prevalent classification models and their judicious application for symmetric key cryptosystem identification. These techniques have been implemented and verified on varieties of known and simulated data sets. After establishing the techniques the problems of cryptosystem identification have been addressed.Defence Science Journal, 2012, 62(1), pp.38-45, DOI:http://dx.doi.org/10.14429/dsj.62.144

Defence Science Journal

Spectrum sensing, spectrum monitoring, and security in cognitive radios

Author: Soltanmohammadi Erfan
Publication venue: LSU Digital Commons
Publication date: 01/01/2014
Field of study

Spectrum sensing is a key function of cognitive radios and is used to determine whether a primary user is present in the channel or not. In this dissertation, we formulate and solve the generalized likelihood ratio test (GLRT) for spectrum sensing when both primary user transmitter and the secondary user receiver are equipped with multiple antennas. We do not assume any prior information about the channel statistics or the primary user’s signal structure. Two cases are considered when the secondary user is aware of the energy of the noise and when it is not. The final test statistics derived from GLRT are based on the eigenvalues of the sample covariance matrix. In-band spectrum sensing in overlay cognitive radio networks requires that the secondary users (SU) periodically suspend their communication in order to determine whether the primary user (PU) has started to utilize the channel. In contrast, in spectrum monitoring the SU can detect the emergence of the PU from its own receiver statistics such as receiver error count (REC). We investigate the problem of spectrum monitoring in the presence of fading where the SU employs diversity combining to mitigate the channel fading effects. We show that a decision statistic based on the REC alone does not provide a good performance. Next we introduce new decision statistics based on the REC and the combiner coefficients. It is shown that the new decision statistic achieves significant improvement in the case of maximal ratio combining (MRC). Next we consider the problem of cooperative spectrum sensing in cognitive radio networks (CRN) in the presence of misbehaving radios. We propose a novel approach based on the iterative expectation maximization (EM) algorithm to detect the presence of the primary users, to classify the cognitive radios, and to compute their detection and false alarm probabilities. We also consider the problem of centralized binary hypothesis testing in a cognitive radio network (CRN) consisting of multiple classes of cognitive radios, where the cognitive radios are classified according to the probability density function (PDF) of their received data (at the FC) under each hypotheses

Louisiana State University