25 research outputs found
Discriminant Projection Representation-based Classification for Vision Recognition
Representation-based classification methods such as sparse
representation-based classification (SRC) and linear regression classification
(LRC) have attracted a lot of attentions. In order to obtain the better
representation, a novel method called projection representation-based
classification (PRC) is proposed for image recognition in this paper. PRC is
based on a new mathematical model. This model denotes that the 'ideal
projection' of a sample point on the hyper-space may be gained by
iteratively computing the projection of on a line of hyper-space with
the proper strategy. Therefore, PRC is able to iteratively approximate the
'ideal representation' of each subject for classification. Moreover, the
discriminant PRC (DPRC) is further proposed, which obtains the discriminant
information by maximizing the ratio of the between-class reconstruction error
over the within-class reconstruction error. Experimental results on five
typical databases show that the proposed PRC and DPRC are effective and
outperform other state-of-the-art methods on several vision recognition tasks.Comment: Accepted by the Thirty-Second AAAI Conference on Artificial
Intelligence (AAAI-18
Convolutional Neural Network and Feature Transformation for Distant Speech Recognition
In many applications, speech recognition must operate in conditions where there are some distances between speakers and the microphones. This is called distant speech recognition (DSR). In this condition, speech recognition must deal with reverberation. Nowadays, deep learning technologies are becoming the the main technologies for speech recognition. Deep Neural Network (DNN) in hybrid with Hidden Markov Model (HMM) is the commonly used architecture. However, this system is still not robust against reverberation. Previous studies use Convolutional Neural Networks (CNN), which is a variation of neural network, to improve the robustness of speech recognition against noise. CNN has the properties of pooling which is used to find local correlation between neighboring dimensions in the features. With this property, CNN could be used as feature learning emphasizing the information on neighboring frames. In this study we use CNN to deal with reverberation. We also propose to use feature transformation techniques: linear discriminat analysis (LDA) and maximum likelihood linear transformation (MLLT), on mel frequency cepstral coefficient (MFCC) before feeding them to CNN. We argue that transforming features could produce more discriminative features for CNN, and hence improve the robustness of speech recognition against reverberation. Our evaluations on Meeting Recorder Digits (MRD) subset of Aurora-5 database confirm that the use of LDA and MLLT transformations improve the robustness of speech recognition. It is better by 20% relative error reduction on compared to a standard DNN based speech recognition using the same number of hidden layers
Discriminant linear processing of time-frequency plane
Extending previous works done on considerably smaller data sets, the paper studies linear discriminant analysis of about 30 hours of phoneme-labeled speech data in the time-frequency domain. Analysis is carried both independently in time and frequency and jointly. Data driven spectral basis show similar frequency sensitivity as human hearing. LDA-derived temporal FIR filters are consistent with temporal lateral inhibition. Considerable improvement is obtained using first temporal discriminant
Deep neural networks in acoustic model
L'estudiant m'ha contactat amb el requeriment d'una oferta per matricular-se i aquesta oferta respon a la seva petició. Després de confirmar amb Secretaria Acadèmica que està acceptat a destinació, deixem títol, descripció, objectius, i tutor extern per determinar quan arribi a destí.Do implementation of a training of a deep neural network acoustic model for speech recognitio
Discriminant linear processing of time-frequency plane
Extending previous works done on considerably smaller data sets, the paper studies linear discriminant analysis of about 30 hours of phoneme-labeled speech data in the time-frequency domain. Analysis is carried both independently in time and frequency and jointly. Data driven spectral basis show similar frequency sensitivity as human hearing. LDA-derived temporal FIR filters are consistent with temporal lateral inhibition. Considerable improvement is obtained using first temporal discriminant
Brain-to-text: Decoding spoken phrases from phone representations in the brain
It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings. Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech
Multi-candidate missing data imputation for robust speech recognition
The application of Missing Data Techniques (MDT) to increase the noise robustness of HMM/GMM-based large vocabulary speech recognizers is hampered by a large computational burden. The likelihood evaluations imply solving many constrained least squares (CLSQ) optimization problems. As an alternative, researchers have proposed frontend MDT or have made oversimplifying independence assumptions for the backend acoustic model. In this article, we propose a fast Multi-Candidate (MC) approach that solves the per-Gaussian CLSQ problems approximately by selecting the best from a small set of candidate solutions, which are generated as the MDT solutions on a reduced set of cluster Gaussians. Experiments show that the MC MDT runs equally fast as the uncompensated recognizer while achieving the accuracy of the full backend optimization approach. The experiments also show that exploiting the more accurate acoustic model of the backend does pay off in terms of accuracy when compared to frontend MDT. © 2012 Wang and Van hamme; licensee Springer.Wang Y., Van hamme H., ''Multi-candidate missing data imputation for robust speech recognition'', EURASIP journal on audio, speech, and music processing, vol. 17, 20 pp., 2012.status: publishe
Recommended from our members
Localized Variable Selection with Random Forest
Due to recent advances in computer technology, the cost of collecting and storing data has dropped drastically. This makes it feasible to collect large amounts of information for each data point. This increasing trend in feature dimensionality justifies the need for research on variable selection. Random forest (RF) has demonstrated the ability to select important variables and model complex data. However, simulations confirm that it fails in detecting less influential features in presence of variables with large impacts in some cases. In this dissertation, we propose two algorithms for localized variable selection: clustering based feature selection (CBFS) and locally adjusted feature importance (LAFI). Both methods aim to find regions where the effects of weaker features can be isolated and measured. CBFS combines RF variable selection with a two-stage clustering method to detect variables where their effect can be detected only in certain regions. LAFI, on the other hand, uses a binary tree approach to split data into bins based on response variable rankings, and implements RF to find important variables in each bin. Larger LAFI is assigned to variables that get selected in more bins. Simulations and real datasets are used to evaluate these variable selection methods. Finally, we also propose an extension to CBFS for localized prediction