Search CORE

2,383 research outputs found

Discrete-Mixture HMMs-based Approach for Noisy Speech Recognition

Author: Masaharu Katoh
Masaki Kohda
Tetsuo Kosaka
Publication venue: 'IntechOpen'
Publication date: 01/06/2007
Field of study

Data compression techniques applied to high resolution high frame rate video technology

Author: Alexovich Robert E.
Hartz William G.
Neustadter Marc S.
Publication venue
Publication date
Field of study

An investigation is presented of video data compression applied to microgravity space experiments using High Resolution High Frame Rate Video Technology (HHVT). An extensive survey of methods of video data compression, described in the open literature, was conducted. The survey examines compression methods employing digital computing. The results of the survey are presented. They include a description of each method and assessment of image degradation and video data parameters. An assessment is made of present and near term future technology for implementation of video data compression in high speed imaging system. Results of the assessment are discussed and summarized. The results of a study of a baseline HHVT video system, and approaches for implementation of video data compression, are presented. Case studies of three microgravity experiments are presented and specific compression techniques and implementations are recommended

NASA Technical Reports Server

Efficient speaker recognition for mobile devices

Author: Karpov Evgeny
Publication venue: University of Eastern Finland
Publication date
Field of study

UEF Electronic Publications

Bayesian distance metric learning and its application in automatic speaker recognition systems

Author: Singh Satyanand
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2019
Field of study

This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Robust Feature Sets for Implementation of Classification Machines

Author: Binesh T
Publication venue: Mohammad Nassar for Researches (MNFR)
Publication date: 20/04/2023
Field of study

Classification Machines have evolved over a lot during recent times, in the field of engineering and sciences. Various classification schemes have been developed, taking into account, the aspect that can be optimized to give maximum system performance.  The feature set in a classifier system is very significant, since it determines the efficiency and performance of the machine. Three powerful feature sets possessing robust classifying capabilities are discussed in this paper. Cepstral coefficient analysis based Kruskal-Wallis H statistic, F-test statistic and Discrete Sine Transform based features are found to be very effective for detection and classification of signals. Simulation results for typical data set are also presented in this paper. Statistical estimators, Neural Network and Hidden Markov Model based classifiers, along with various deep learning algorithms can be incorporated along with these feature sets to implement an efficient classifying machine. Typical results based on these feature sets are also presented for different signal sources.&nbsp

American Scientific Research Journal for Engineering, Technology, and Sciences (ASRJETS)

Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators

Author: Burr Geoffrey W.
Chen An
Fasoli Andrea
Gallo Manuel Le
Li Ning
Mackin Charles
Nandakumar S. R.
Narayanan Pritish
Narayanan Vijay
Odermatt Frederic
Rasch Malte J.
Sebastian Abu
Tsai Hsinyu
Publication venue
Publication date: 16/02/2023
Field of study

Analog in-memory computing (AIMC) -- a promising approach for energy-efficient acceleration of deep learning workloads -- computes matrix-vector multiplications (MVMs) but only approximately, due to nonidealities that often are non-deterministic or nonlinear. This can adversely impact the achievable deep neural network (DNN) inference accuracy as compared to a conventional floating point (FP) implementation. While retraining has previously been suggested to improve robustness, prior work has explored only a few DNN topologies, using disparate and overly simplified AIMC hardware models. Here, we use hardware-aware (HWA) training to systematically examine the accuracy of AIMC for multiple common artificial intelligence (AI) workloads across multiple DNN topologies, and investigate sensitivity and robustness to a broad set of nonidealities. By introducing a new and highly realistic AIMC crossbar-model, we improve significantly on earlier retraining approaches. We show that many large-scale DNNs of various topologies, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers, can in fact be successfully retrained to show iso-accuracy on AIMC. Our results further suggest that AIMC nonidealities that add noise to the inputs or outputs, not the weights, have the largest impact on DNN accuracy, and that RNNs are particularly robust to all nonidealities.Comment: 35 pages, 7 figures, 5 table

arXiv.org e-Print Archive

Noise-robust speaker recognition using reduced Multiconditional Gaussian Mixture models

Author: Berger Pedro de Azevedo
D’Almeida Frederico Quadros
Nascimento Francisco Assis de Oliveira
Silva Lúcio Martins da
Publication venue: 'ABEAT - Associacao Brasileira de Especialistas em Alta Tecnologia'
Publication date: 01/01/2008
Field of study

Multiconditional Modeling is widely used to create noise-robust speaker recognition systems. However, the approach is computationally intensive. An alternative is to optimize the training condition set in order to achieve maximum noise robustness while using the smallest possible number of noise conditions during training. This paper establishes the optimal conditions for a noise-robust training model by considering audio material at different sampling rates and with different coding methods. Our results demonstrate that using approximately four training noise conditions is sufficient to guarantee robust models in the 60 dB to 10 dB Signal-to-Noise Ratio (SNR) range

Repositório Institucional da Universidade de Brasília

Crossref

A Novel Windowing Technique for Efficient Computation of MFCC for Speaker Recognition

Author: Saha Goutam
Sahidullah Md.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/06/2012
Field of study

In this paper, we propose a novel family of windowing technique to compute Mel Frequency Cepstral Coefficient (MFCC) for automatic speaker recognition from speech. The proposed method is based on fundamental property of discrete time Fourier transform (DTFT) related to differentiation in frequency domain. Classical windowing scheme such as Hamming window is modified to obtain derivatives of discrete time Fourier transform coefficients. It has been mathematically shown that the slope and phase of power spectrum are inherently incorporated in newly computed cepstrum. Speaker recognition systems based on our proposed family of window functions are shown to attain substantial and consistent performance improvement over baseline single tapered Hamming window as well as recently proposed multitaper windowing technique

arXiv.org e-Print Archive

Crossref