865 research outputs found
Local Decorrelation For Improved Detection
Even with the advent of more sophisticated, data-hungry methods, boosted
decision trees remain extraordinarily successful for fast rigid object
detection, achieving top accuracy on numerous datasets. While effective, most
boosted detectors use decision trees with orthogonal (single feature) splits,
and the topology of the resulting decision boundary may not be well matched to
the natural topology of the data. Given highly correlated data, decision trees
with oblique (multiple feature) splits can be effective. Use of oblique splits,
however, comes at considerable computational expense. Inspired by recent work
on discriminative decorrelation of HOG features, we instead propose an
efficient feature transform that removes correlations in local neighborhoods.
The result is an overcomplete but locally decorrelated representation ideally
suited for use with orthogonal decision trees. In fact, orthogonal trees with
our locally decorrelated features outperform oblique trees trained over the
original features at a fraction of the computational cost. The overall
improvement in accuracy is dramatic: on the Caltech Pedestrian Dataset, we
reduce false positives nearly tenfold over the previous state-of-the-art.Comment: To appear in Neural Information Processing Systems (NIPS), 201
Machine Learning Techniques and Applications For Ground-based Image Analysis
Ground-based whole sky cameras have opened up new opportunities for
monitoring the earth's atmosphere. These cameras are an important complement to
satellite images by providing geoscientists with cheaper, faster, and more
localized data. The images captured by whole sky imagers can have high spatial
and temporal resolution, which is an important pre-requisite for applications
such as solar energy modeling, cloud attenuation analysis, local weather
prediction, etc.
Extracting valuable information from the huge amount of image data by
detecting and analyzing the various entities in these images is challenging.
However, powerful machine learning techniques have become available to aid with
the image analysis. This article provides a detailed walk-through of recent
developments in these techniques and their applications in ground-based
imaging. We aim to bridge the gap between computer vision and remote sensing
with the help of illustrative examples. We demonstrate the advantages of using
machine learning techniques in ground-based image analysis via three primary
applications -- segmentation, classification, and denoising
A Face Recognition approach based on entropy estimate of the nonlinear DCT features in the Logarithm Domain together with Kernel Entropy Component Analysis
This paper exploits the feature extraction capabilities of the discrete
cosine transform (DCT) together with an illumination normalization approach in
the logarithm domain that increase its robustness to variations in facial
geometry and illumination. Secondly in the same domain the entropy measures are
applied on the DCT coefficients so that maximum entropy preserving pixels can
be extracted as the feature vector. Thus the informative features of a face can
be extracted in a low dimensional space. Finally, the kernel entropy component
analysis (KECA) with an extension of arc cosine kernels is applied on the
extracted DCT coefficients that contribute most to the entropy estimate to
obtain only those real kernel ECA eigenvectors that are associated with
eigenvalues having high positive entropy contribution. The resulting system was
successfully tested on real image sequences and is robust to significant
partial occlusion and illumination changes, validated with the experiments on
the FERET, AR, FRAV2D and ORL face databases. Experimental comparison is
demonstrated to prove the superiority of the proposed approach in respect to
recognition accuracy. Using specificity and sensitivity we find that the best
is achieved when Renyi entropy is applied on the DCT coefficients. Extensive
experimental comparison is demonstrated to prove the superiority of the
proposed approach in respect to recognition accuracy. Moreover, the proposed
approach is very simple, computationally fast and can be implemented in any
real-time face recognition system.Comment: 9 pages,Published Online August 2013 in MECS. International Journal
of Information Technology and Computer Science, 2013. arXiv admin note: text
overlap with arXiv:1112.3712 by other author
Improved Frequency Modulation Features for Multichannel Distant Speech Recognition
Frequency modulation features capture the fine structure of speech formants
that constitute beneficial and supplementary to the traditional energy-based
cepstral features. Improvements have been demonstrated mainly in GMM-HMM
systems for small and large vocabulary tasks. Yet, they have limited
applications in DNN-HMM systems and Distant Speech Recognition (DSR) tasks.
Herein, we elaborate on their integration within state-of-the-art front-end
schemes that include post-processing of MFCCs resulting in discriminant and
speaker adapted features of large temporal contexts. We explore 1) multichannel
demodulation schemes for multi-microphone setups, 2) richer descriptors of
frequency modulations, and 3) feature transformation and combination via
hierarchical deep networks. We present results for tandem and hybrid
recognition with GMM and DNN acoustic models, respectively. The improved
modulation features are combined efficiently with MFCCs yielding modest and
consistent improvements in multichannel distant speech recognition tasks on
reverberant and noisy environments, where recognition rates are far from human
performance
Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition
The shifted delta cepstrum (SDC) is a widely used feature extraction for language recognition (LRE). With a high context width due to incorporation of multiple frames, SDC outperforms traditional delta and acceleration feature vectors. However, it also introduces correlation into the concatenated feature vector, which increases redundancy and may degrade the performance of backend classifiers. In this paper, we first propose a time-frequency cepstral (TFC) feature vector, which is obtained by performing a temporal discrete cosine transform (DCT) on the cepstrum matrix and selecting the transformed elements in a zigzag scan order. Beyond this, we increase discriminability through a heteroscedastic linear discriminant analysis (HLDA) on the full cepstrum matrix. By utilizing block diagonal matrix constraints, the large HLDA problem is then reduced to several smaller HLDA problems, creating a block diagonal HLDA (BDHLDA) algorithm which has much lower computational complexity. The BDHLDA method is finally extended to the GMM domain, using the simpler TFC features during re-estimation to provide significantly improved computation speed. Experiments on NIST 2003 and 2007 LRE evaluation corpora show that TFC is more effective than SDC, and that the GMM-based BDHLDA results in lower equal error rate (EER) and minimum average cost (Cavg) than either TFC or SDC approaches
Speech Recognition Front End Without Information Loss
Speech representation and modelling in high-dimensional spaces of acoustic
waveforms, or a linear transformation thereof, is investigated with the aim of
improving the robustness of automatic speech recognition to additive noise. The
motivation behind this approach is twofold: (i) the information in acoustic
waveforms that is usually removed in the process of extracting low-dimensional
features might aid robust recognition by virtue of structured redundancy
analogous to channel coding, (ii) linear feature domains allow for exact noise
adaptation, as opposed to representations that involve non-linear processing
which makes noise adaptation challenging. Thus, we develop a generative
framework for phoneme modelling in high-dimensional linear feature domains, and
use it in phoneme classification and recognition tasks. Results show that
classification and recognition in this framework perform better than analogous
PLP and MFCC classifiers below 18 dB SNR. A combination of the high-dimensional
and MFCC features at the likelihood level performs uniformly better than either
of the individual representations across all noise levels
Updating the silent speech challenge benchmark with deep learning
The 2010 Silent Speech Challenge benchmark is updated with new results
obtained in a Deep Learning strategy, using the same input features and
decoding strategy as in the original article. A Word Error Rate of 6.4% is
obtained, compared to the published value of 17.4%. Additional results
comparing new auto-encoder-based features with the original features at reduced
dimensionality, as well as decoding scenarios on two different language models,
are also presented. The Silent Speech Challenge archive has been updated to
contain both the original and the new auto-encoder features, in addition to the
original raw data.Comment: 25 pages, 6 page
Comparing phonemes and visemes with DNN-based lipreading
There is debate if phoneme or viseme units are the most effective for a
lipreading system. Some studies use phoneme units even though phonemes describe
unique short sounds; other studies tried to improve lipreading accuracy by
focusing on visemes with varying results. We compare the performance of a
lipreading system by modeling visual speech using either 13 viseme or 38
phoneme units. We report the accuracy of our system at both word and unit
levels. The evaluation task is large vocabulary continuous speech using the
TCD-TIMIT corpus. We complete our visual speech modeling via hybrid DNN-HMMs
and our visual speech decoder is a Weighted Finite-State Transducer (WFST). We
use DCT and Eigenlips as a representation of mouth ROI image. The phoneme
lipreading system word accuracy outperforms the viseme based system word
accuracy. However, the phoneme system achieved lower accuracy at the unit level
which shows the importance of the dictionary for decoding classification
outputs into words
Optimization of distributions differences for classification
In this paper we introduce a new classification algorithm called Optimization
of Distributions Differences (ODD). The algorithm aims to find a transformation
from the feature space to a new space where the instances in the same class are
as close as possible to one another while the gravity centers of these classes
are as far as possible from one another. This aim is formulated as a
multiobjective optimization problem that is solved by a hybrid of an
evolutionary strategy and the Quasi-Newton method. The choice of the
transformation function is flexible and could be any continuous space function.
We experiment with a linear and a non-linear transformation in this paper. We
show that the algorithm can outperform 6 other state-of-the-art classification
methods, namely naive Bayes, support vector machines, linear discriminant
analysis, multi-layer perceptrons, decision trees, and k-nearest neighbors, in
12 standard classification datasets. Our results show that the method is less
sensitive to the imbalanced number of instances comparing to these methods. We
also show that ODD maintains its performance better than other classification
methods in these datasets, hence, offers a better generalization ability
- …