Search CORE

25 research outputs found

Discriminant Projection Representation-based Classification for Vision Recognition

Author: Feng Qingxiang
Zhou Yicong
Publication venue
Publication date: 19/11/2017
Field of study

Representation-based classification methods such as sparse representation-based classification (SRC) and linear regression classification (LRC) have attracted a lot of attentions. In order to obtain the better representation, a novel method called projection representation-based classification (PRC) is proposed for image recognition in this paper. PRC is based on a new mathematical model. This model denotes that the 'ideal projection' of a sample point

x

on the hyper-space

H

may be gained by iteratively computing the projection of

x

on a line of hyper-space

H

with the proper strategy. Therefore, PRC is able to iteratively approximate the 'ideal representation' of each subject for classification. Moreover, the discriminant PRC (DPRC) is further proposed, which obtains the discriminant information by maximizing the ratio of the between-class reconstruction error over the within-class reconstruction error. Experimental results on five typical databases show that the proposed PRC and DPRC are effective and outperform other state-of-the-art methods on several vision recognition tasks.Comment: Accepted by the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Convolutional Neural Network and Feature Transformation for Distant Speech Recognition

Author: Pardede Hilman F.
Sustika Rika
Yuliani Asri R.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2018
Field of study

In many applications, speech recognition must operate in conditions where there are some distances between speakers and the microphones. This is called distant speech recognition (DSR). In this condition, speech recognition must deal with reverberation. Nowadays, deep learning technologies are becoming the the main technologies for speech recognition. Deep Neural Network (DNN) in hybrid with Hidden Markov Model (HMM) is the commonly used architecture. However, this system is still not robust against reverberation. Previous studies use Convolutional Neural Networks (CNN), which is a variation of neural network, to improve the robustness of speech recognition against noise. CNN has the properties of pooling which is used to find local correlation between neighboring dimensions in the features. With this property, CNN could be used as feature learning emphasizing the information on neighboring frames. In this study we use CNN to deal with reverberation. We also propose to use feature transformation techniques: linear discriminat analysis (LDA) and maximum likelihood linear transformation (MLLT), on mel frequency cepstral coefficient (MFCC) before feeding them to CNN. We argue that transforming features could produce more discriminative features for CNN, and hence improve the robustness of speech recognition against reverberation. Our evaluations on Meeting Recorder Digits (MRD) subset of Aurora-5 database confirm that the use of LDA and MLLT transformations improve the robustness of speech recognition. It is better by 20% relative error reduction on compared to a standard DNN based speech recognition using the same number of hidden layers

IAES journal

Crossref

Institute of Advanced Engineering and Science

Feature Transformation Based on Generalization of Linear Discriminant Analysis

Author: Makoto Sakai
Norihide Kitaoka
Seiichi Nakagawa
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Discriminant linear processing of time-frequency plane

Author: Hermansky Hynek
Valente Fabio
Publication venue
Publication date: 11/02/2010
Field of study

Extending previous works done on considerably smaller data sets, the paper studies linear discriminant analysis of about 30 hours of phoneme-labeled speech data in the time-frequency domain. Analysis is carried both independently in time and frequency and jointly. Data driven spectral basis show similar frequency sensitivity as human hearing. LDA-derived temporal FIR filters are consistent with temporal lateral inhibition. Considerable improvement is obtained using first temporal discriminant

Infoscience - École polytechnique fédérale de Lausanne

Deep neural networks in acoustic model

Author: Camacho Tejedor Oriol
Publication venue: Universitat Politècnica de Catalunya
Publication date: 25/05/2016
Field of study

L'estudiant m'ha contactat amb el requeriment d'una oferta per matricular-se i aquesta oferta respon a la seva petició. Després de confirmar amb Secretaria Acadèmica que està acceptat a destinació, deixem títol, descripció, objectius, i tutor extern per determinar quan arribi a destí.Do implementation of a training of a deep neural network acoustic model for speech recognitio

UPCommons. Portal del coneixement obert de la UPC

Discriminant linear processing of time-frequency plane

Author: Hermansky Hynek
Valente Fabio
Publication venue: IDIAP
Publication date: 08/06/2006
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Effectiveness of discriminative training and feature transformation for reverberated and noisy speech

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Brain-to-text: Decoding spoken phrases from phone representations in the brain

Author: Adriana de Pesters
Blakely
Bouchard
Bouchard
Brumberg
Canolty
Chang
Christian Herff
Crane
Crone
Crone
Deng
Dominic Heger
Dominic Telaar
Farwell
Formisano
Fukuda
Gales
Gales
Gasser
Gerwin Schalk
Guenther
Haeb-Umbach
Huang
Jelinek
Kellis
Kennedy
Kubanek
Kubanek
Lee
Leuthardt
Leuthardt
Lotte
Martin
McFarland
Mesgarani
Miller
Mugler
Mugler
Pasley
Pei
Pei
Peter Brunner
Potes
PulvermÃ¼ller
Rabiner
Roy
Sahin
Schalk
Schultz
Stolcke
Sutter
Talairach
Tanja Schultz
Telaar
Towle
Unknown.
Wolpaw
Publication venue: Frontiers Media
Publication date: 01/01/2015
Field of study

It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings. Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech

Crossref

KITopen

Frontiers - Publisher Connector

PubMed Central

Multi-candidate missing data imputation for robust speech recognition

Author: Hugo Van hamme
Yujun Wang
Publication venue: Springer Nature
Publication date: 01/01/2012
Field of study

The application of Missing Data Techniques (MDT) to increase the noise robustness of HMM/GMM-based large vocabulary speech recognizers is hampered by a large computational burden. The likelihood evaluations imply solving many constrained least squares (CLSQ) optimization problems. As an alternative, researchers have proposed frontend MDT or have made oversimplifying independence assumptions for the backend acoustic model. In this article, we propose a fast Multi-Candidate (MC) approach that solves the per-Gaussian CLSQ problems approximately by selecting the best from a small set of candidate solutions, which are generated as the MDT solutions on a reduced set of cluster Gaussians. Experiments show that the MC MDT runs equally fast as the uncompensated recognizer while achieving the accuracy of the full backend optimization approach. The experiments also show that exploiting the more accurate acoustic model of the backend does pay off in terms of accuracy when compared to frontend MDT. © 2012 Wang and Van hamme; licensee Springer.Wang Y., Van hamme H., ''Multi-candidate missing data imputation for robust speech recognition'', EURASIP journal on audio, speech, and music processing, vol. 17, 20 pp., 2012.status: publishe

Lirias

Springer - Publisher Connector

Recommended from our members

Localized Variable Selection with Random Forest

Author: Niyaghi Faraz
Publication venue: 'Oregon State University'
Publication date
Field of study

Due to recent advances in computer technology, the cost of collecting and storing data has dropped drastically. This makes it feasible to collect large amounts of information for each data point. This increasing trend in feature dimensionality justifies the need for research on variable selection. Random forest (RF) has demonstrated the ability to select important variables and model complex data. However, simulations confirm that it fails in detecting less influential features in presence of variables with large impacts in some cases. In this dissertation, we propose two algorithms for localized variable selection: clustering based feature selection (CBFS) and locally adjusted feature importance (LAFI). Both methods aim to find regions where the effects of weaker features can be isolated and measured. CBFS combines RF variable selection with a two-stage clustering method to detect variables where their effect can be detected only in certain regions. LAFI, on the other hand, uses a binary tree approach to split data into bins based on response variable rankings, and implements RF to find important variables in each bin. Larger LAFI is assigned to variables that get selected in more bins. Simulations and real datasets are used to evaluate these variable selection methods. Finally, we also propose an extension to CBFS for localized prediction

ScholarsArchive@OSU