2,383 research outputs found

    Data compression techniques applied to high resolution high frame rate video technology

    Get PDF
    An investigation is presented of video data compression applied to microgravity space experiments using High Resolution High Frame Rate Video Technology (HHVT). An extensive survey of methods of video data compression, described in the open literature, was conducted. The survey examines compression methods employing digital computing. The results of the survey are presented. They include a description of each method and assessment of image degradation and video data parameters. An assessment is made of present and near term future technology for implementation of video data compression in high speed imaging system. Results of the assessment are discussed and summarized. The results of a study of a baseline HHVT video system, and approaches for implementation of video data compression, are presented. Case studies of three microgravity experiments are presented and specific compression techniques and implementations are recommended

    Efficient speaker recognition for mobile devices

    Get PDF

    Bayesian distance metric learning and its application in automatic speaker recognition systems

    Get PDF
    This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data

    Robust Feature Sets for Implementation of Classification Machines

    Get PDF
    Classification Machines have evolved over a lot during recent times, in the field of engineering and sciences. Various classification schemes have been developed, taking into account, the aspect that can be optimized to give maximum system performance.  The feature set in a classifier system is very significant, since it determines the efficiency and performance of the machine. Three powerful feature sets possessing robust classifying capabilities are discussed in this paper. Cepstral coefficient analysis based Kruskal-Wallis H statistic, F-test statistic and Discrete Sine Transform based features are found to be very effective for detection and classification of signals. Simulation results for typical data set are also presented in this paper. Statistical estimators, Neural Network and Hidden Markov Model based classifiers, along with various deep learning algorithms can be incorporated along with these feature sets to implement an efficient classifying machine. Typical results based on these feature sets are also presented for different signal sources.&nbsp

    Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators

    Full text link
    Analog in-memory computing (AIMC) -- a promising approach for energy-efficient acceleration of deep learning workloads -- computes matrix-vector multiplications (MVMs) but only approximately, due to nonidealities that often are non-deterministic or nonlinear. This can adversely impact the achievable deep neural network (DNN) inference accuracy as compared to a conventional floating point (FP) implementation. While retraining has previously been suggested to improve robustness, prior work has explored only a few DNN topologies, using disparate and overly simplified AIMC hardware models. Here, we use hardware-aware (HWA) training to systematically examine the accuracy of AIMC for multiple common artificial intelligence (AI) workloads across multiple DNN topologies, and investigate sensitivity and robustness to a broad set of nonidealities. By introducing a new and highly realistic AIMC crossbar-model, we improve significantly on earlier retraining approaches. We show that many large-scale DNNs of various topologies, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers, can in fact be successfully retrained to show iso-accuracy on AIMC. Our results further suggest that AIMC nonidealities that add noise to the inputs or outputs, not the weights, have the largest impact on DNN accuracy, and that RNNs are particularly robust to all nonidealities.Comment: 35 pages, 7 figures, 5 table

    Noise-robust speaker recognition using reduced Multiconditional Gaussian Mixture models

    Get PDF
    Multiconditional Modeling is widely used to create noise-robust speaker recognition systems. However, the approach is computationally intensive. An alternative is to optimize the training condition set in order to achieve maximum noise robustness while using the smallest possible number of noise conditions during training. This paper establishes the optimal conditions for a noise-robust training model by considering audio material at different sampling rates and with different coding methods. Our results demonstrate that using approximately four training noise conditions is sufficient to guarantee robust models in the 60 dB to 10 dB Signal-to-Noise Ratio (SNR) range

    A Novel Windowing Technique for Efficient Computation of MFCC for Speaker Recognition

    Full text link
    In this paper, we propose a novel family of windowing technique to compute Mel Frequency Cepstral Coefficient (MFCC) for automatic speaker recognition from speech. The proposed method is based on fundamental property of discrete time Fourier transform (DTFT) related to differentiation in frequency domain. Classical windowing scheme such as Hamming window is modified to obtain derivatives of discrete time Fourier transform coefficients. It has been mathematically shown that the slope and phase of power spectrum are inherently incorporated in newly computed cepstrum. Speaker recognition systems based on our proposed family of window functions are shown to attain substantial and consistent performance improvement over baseline single tapered Hamming window as well as recently proposed multitaper windowing technique
    • 

    corecore