535 research outputs found
Recommended from our members
Evaluation and analysis of hybrid intelligent pattern recognition techniques for speaker identification
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem
of identifying a speaker from its voice regardless of the content (i.e.
text-independent), and to design efficient methods of combining face and voice in producing a robust authentication system.
A novel approach towards speaker identification is developed using
wavelet analysis, and multiple neural networks including Probabilistic
Neural Network (PNN), General Regressive Neural Network (GRNN)and Radial Basis Function-Neural Network (RBF NN) with the AND
voting scheme. This approach is tested on GRID and VidTIMIT cor-pora and comprehensive test results have been validated with state-
of-the-art approaches. The system was found to be competitive and it improved the recognition rate by 15% as compared to the classical Mel-frequency Cepstral Coe±cients (MFCC), and reduced the recognition time by 40% compared to Back Propagation Neural Network (BPNN), Gaussian Mixture Models (GMM) and Principal Component Analysis (PCA).
Another novel approach using vowel formant analysis is implemented using Linear Discriminant Analysis (LDA). Vowel formant based speaker identification is best suitable for real-time implementation and requires only a few bytes of information to be stored for each speaker, making it both storage and time efficient. Tested on GRID and Vid-TIMIT, the proposed scheme was found to be 85.05% accurate when Linear Predictive Coding (LPC) is used to extract the vowel formants, which is much higher than the accuracy of BPNN and GMM. Since the proposed scheme does not require any training time other than creating a small database of vowel formants, it is faster as well. Furthermore, an increasing number of speakers makes it di±cult for BPNN and GMM to sustain their accuracy, but the proposed score-based methodology stays almost linear.
Finally, a novel audio-visual fusion based identification system is implemented using GMM and MFCC for speaker identi¯cation and PCA for face recognition. The results of speaker identification and face recognition are fused at different levels, namely the feature, score and decision levels. Both the score-level and decision-level (with OR voting) fusions were shown to outperform the feature-level fusion in terms of accuracy and error resilience. The result is in line with the distinct nature of the two modalities which lose themselves when combined at the feature-level. The GRID and VidTIMIT test results validate that
the proposed scheme is one of the best candidates for the fusion of
face and voice due to its low computational time and high recognition accuracy
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning
Learning-based pattern classifiers, including deep networks, have shown
impressive performance in several application domains, ranging from computer
vision to cybersecurity. However, it has also been shown that adversarial input
perturbations carefully crafted either at training or at test time can easily
subvert their predictions. The vulnerability of machine learning to such wild
patterns (also referred to as adversarial examples), along with the design of
suitable countermeasures, have been investigated in the research field of
adversarial machine learning. In this work, we provide a thorough overview of
the evolution of this research area over the last ten years and beyond,
starting from pioneering, earlier work on the security of non-deep learning
algorithms up to more recent work aimed to understand the security properties
of deep learning algorithms, in the context of computer vision and
cybersecurity tasks. We report interesting connections between these
apparently-different lines of work, highlighting common misconceptions related
to the security evaluation of machine-learning algorithms. We review the main
threat models and attacks defined to this end, and discuss the main limitations
of current work, along with the corresponding future challenges towards the
design of more secure learning algorithms.Comment: Accepted for publication on Pattern Recognition, 201
EmoNets: Multimodal deep learning approaches for emotion recognition in video
The task of the emotion recognition in the wild (EmotiW) Challenge is to
assign one of seven emotions to short video clips extracted from Hollywood
style movies. The videos depict acted-out emotions under realistic conditions
with a large degree of variation in attributes such as pose and illumination,
making it worthwhile to explore approaches which consider combinations of
features from multiple modalities for label assignment. In this paper we
present our approach to learning several specialist models using deep learning
techniques, each focusing on one modality. Among these are a convolutional
neural network, focusing on capturing visual information in detected faces, a
deep belief net focusing on the representation of the audio stream, a K-Means
based "bag-of-mouths" model, which extracts visual features around the mouth
region and a relational autoencoder, which addresses spatio-temporal aspects of
videos. We explore multiple methods for the combination of cues from these
modalities into one common classifier. This achieves a considerably greater
accuracy than predictions from our strongest single-modality classifier. Our
method was the winning submission in the 2013 EmotiW challenge and achieved a
test set accuracy of 47.67% on the 2014 dataset
Cross-lingual dysphonic speech detection using pretrained speaker embeddings
In this study, cross-lingual binary classification and severity estimation of dysphonic speech have been carried out. Hand-crafted acoustic feature extraction is replaced by the speaker embedding techniques used in the speaker verification. Two state of art deep learning methods for speaker verification have been used: the X-vector and ECAPA-TDNN. Embeddings are extracted from speech samples in Hungarian and Dutch languages and used to train Support Vector Machine (SVM) and Support Vector Regressor (SVR) for binary classification and severity estimation, in a cross-language manner. Our results were competitive with manual feature engineering, when the models were trained on Hungarian samples and evaluated on Dutch samples in the binary classification of dysphonic speech and outperformed in estimating the severity level of dysphonic speech. Moreover, our model achieved 0.769 and 0.771 in Spearman and Pearson correlations. Also, our results in both classification and regression were superior compared to manual feature extraction technique when models were trained on Dutch samples and evaluated on Hungarian samples with only a limited number of samples are available for training. An accuracy of 86.8% was reached with features extracted from embedding methods, while the maximum accuracy using hand-crafted acoustic features was 66.8%. Overall results show that Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN) performs better than the former X-vector in both tasks
DEVELOPMENT OF EXCITATION STRUCTURE RBF-METAMODELS OF MOVING CONCENTRIC EDDY CURRENT PROBE
Introduction. The work is devoted to metamodels creation of surface circular concentric eddy current probe. Formulation of the problem. In the problem of surface circular concentric eddy current probe synthesis in the general formulation, apriori given desired eddy currents density distribution in the control zone was used. The realization of the optimal synthesis problem involves a multiple solution to the analysis problem for each current structure of numerical calculations excitation, which are very costly in terms of computational and time costs, which makes it impossible to solve the synthesis problem in the classical formulation. By solving the critical resource intensiveness problem, there is the surrogate optimization technology using of that uses the surface circular concentric eddy current probe metamodel, which is much simpler in realization and is an approximation of the exact electrodynamic model. Goal. Creation of surface circular concentric eddy current probe RBF-metamodels, which can be used to calculate eddy currents density distribution in the control zone and suitable for use in optimal synthesis problems. Method. To develop an approximation model, a mathematical apparatus for artificial neural networks, namely, RBF–networks, has been used, whose accuracy has been increased with the help of the neural networks committee. Correction of errors in the committee was reduced by applying the bagging procedure. During the network training the regularization technique is used, which avoids re-learning the neural network. The computer experiment plan was performed using the Sobol LPt–sequences. The obtained multivariable regression model quality evaluation was performed by checking the response surface reproducibility correctness in the entire region of variables variation. Results. The modelling of eddy currents density distribution calculations on exact electrodynamic mathematical models in the experimental plan points are carried out. For the immovable and moving surface circular concentric eddy current probe, RBF–metamodels were constructed with varying spatial coordinates and radius. Scientific novelty. Software was developed for eddy currents density distribution calculation in the surface circular concentric eddy current probe control zone taking into account the speed effect on exact electrodynamic mathematical models and for forming experiment plan points using the Sobol LPt–sequences. The geometric surface circular concentric eddy current probe excitation structures models with homogeneous sensitivity for their optimal synthesis taking into account the speed effect are proposed. Improved computing technology for constructing metamodels. The RBF-metamodels of the surface circular concentric eddy current probe are built and based on the speed effect. Practical significance. The work results can be used in the surface circular concentric eddy current probe synthesis with an apriori given eddy currents density distribution in the control zone
Forecasting Financial Distress With Machine Learning – A Review
Purpose – Evaluate the various academic researches with multiple views on credit risk and artificial intelligence (AI) and their evolution.Theoretical framework – The study is divided as follows: Section 1 introduces the article. Section 2 deals with credit risk and its relationship with computational models and techniques. Section 3 presents the methodology. Section 4 addresses a discussion of the results and challenges on the topic. Finally, section 5 presents the conclusions.Design/methodology/approach – A systematic review of the literature was carried out without defining the time period and using the Web of Science and Scopus database.Findings – The application of computational technology in the scope of credit risk analysis has drawn attention in a unique way. It was found that the demand for identification and introduction of new variables, classifiers and more assertive methods is constant. The effort to improve the interpretation of data and models is intense.Research, Practical & Social implications – It contributes to the verification of the theory, providing information in relation to the most used methods and techniques, it brings a wide analysis to deepen the knowledge of the factors and variables on the theme. It categorizes the lines of research and provides a summary of the literature, which serves as a reference, in addition to suggesting future research.Originality/value – Research in the area of Artificial Intelligence and Machine Learning is recent and requires attention and investigation, thus, this study contributes to the opening of new views in order to deepen the work on this topic
Handwritten Digit Recognition and Classification Using Machine Learning
In this paper, multiple learning techniques based on Optical character recognition (OCR) for the handwritten digit recognition are examined, and a new accuracy level for recognition of the MNIST dataset is reported. The proposed framework involves three primary parts, image pre-processing, feature extraction and classification. This study strives to improve the recognition accuracy by more than 99% in handwritten digit recognition. As will be seen, pre-processing and feature extraction play crucial roles in this experiment to reach the highest accuracy
- …