Search CORE

865 research outputs found

Local Decorrelation For Improved Detection

Author: Dollár Piotr
Han Joon Hee
Nam Woonhyun
Publication venue
Publication date: 03/11/2014
Field of study

Even with the advent of more sophisticated, data-hungry methods, boosted decision trees remain extraordinarily successful for fast rigid object detection, achieving top accuracy on numerous datasets. While effective, most boosted detectors use decision trees with orthogonal (single feature) splits, and the topology of the resulting decision boundary may not be well matched to the natural topology of the data. Given highly correlated data, decision trees with oblique (multiple feature) splits can be effective. Use of oblique splits, however, comes at considerable computational expense. Inspired by recent work on discriminative decorrelation of HOG features, we instead propose an efficient feature transform that removes correlations in local neighborhoods. The result is an overcomplete but locally decorrelated representation ideally suited for use with orthogonal decision trees. In fact, orthogonal trees with our locally decorrelated features outperform oblique trees trained over the original features at a fraction of the computational cost. The overall improvement in accuracy is dramatic: on the Caltech Pedestrian Dataset, we reduce false positives nearly tenfold over the previous state-of-the-art.Comment: To appear in Neural Information Processing Systems (NIPS), 201

arXiv.org e-Print Archive

Machine Learning Techniques and Applications For Ground-based Image Analysis

Author: Dev Soumyabrata
Lee Yee Hui
Wen Bihan
Winkler Stefan
Publication venue
Publication date: 08/06/2016
Field of study

Ground-based whole sky cameras have opened up new opportunities for monitoring the earth's atmosphere. These cameras are an important complement to satellite images by providing geoscientists with cheaper, faster, and more localized data. The images captured by whole sky imagers can have high spatial and temporal resolution, which is an important pre-requisite for applications such as solar energy modeling, cloud attenuation analysis, local weather prediction, etc. Extracting valuable information from the huge amount of image data by detecting and analyzing the various entities in these images is challenging. However, powerful machine learning techniques have become available to aid with the image analysis. This article provides a detailed walk-through of recent developments in these techniques and their applications in ground-based imaging. We aim to bridge the gap between computer vision and remote sensing with the help of illustrative examples. We demonstrate the advantages of using machine learning techniques in ground-based image analysis via three primary applications -- segmentation, classification, and denoising

arXiv.org e-Print Archive

A Face Recognition approach based on entropy estimate of the nonlinear DCT features in the Logarithm Domain together with Kernel Entropy Component Analysis

Author: Basu Dipak Kumar
Bhattacharjee Debotosh
Kar Arindam
Kundu Mahantapas
Nasipuri Mita
Publication venue: 'MECS Publisher'
Publication date: 01/08/2013
Field of study

This paper exploits the feature extraction capabilities of the discrete cosine transform (DCT) together with an illumination normalization approach in the logarithm domain that increase its robustness to variations in facial geometry and illumination. Secondly in the same domain the entropy measures are applied on the DCT coefficients so that maximum entropy preserving pixels can be extracted as the feature vector. Thus the informative features of a face can be extracted in a low dimensional space. Finally, the kernel entropy component analysis (KECA) with an extension of arc cosine kernels is applied on the extracted DCT coefficients that contribute most to the entropy estimate to obtain only those real kernel ECA eigenvectors that are associated with eigenvalues having high positive entropy contribution. The resulting system was successfully tested on real image sequences and is robust to significant partial occlusion and illumination changes, validated with the experiments on the FERET, AR, FRAV2D and ORL face databases. Experimental comparison is demonstrated to prove the superiority of the proposed approach in respect to recognition accuracy. Using specificity and sensitivity we find that the best is achieved when Renyi entropy is applied on the DCT coefficients. Extensive experimental comparison is demonstrated to prove the superiority of the proposed approach in respect to recognition accuracy. Moreover, the proposed approach is very simple, computationally fast and can be implemented in any real-time face recognition system.Comment: 9 pages,Published Online August 2013 in MECS. International Journal of Information Technology and Computer Science, 2013. arXiv admin note: text overlap with arXiv:1112.3712 by other author

arXiv.org e-Print Archive

Directory of Open Access Journals

Improved Frequency Modulation Features for Multichannel Distant Speech Recognition

Author: Maragos Petros
Rodomagoulakis Isidoros
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/11/2018
Field of study

Frequency modulation features capture the fine structure of speech formants that constitute beneficial and supplementary to the traditional energy-based cepstral features. Improvements have been demonstrated mainly in GMM-HMM systems for small and large vocabulary tasks. Yet, they have limited applications in DNN-HMM systems and Distant Speech Recognition (DSR) tasks. Herein, we elaborate on their integration within state-of-the-art front-end schemes that include post-processing of MFCCs resulting in discriminant and speaker adapted features of large temporal contexts. We explore 1) multichannel demodulation schemes for multi-microphone setups, 2) richer descriptors of frequency modulations, and 3) feature transformation and combination via hierarchical deep networks. We present results for tandem and hybrid recognition with GMM and DNN acoustic models, respectively. The improved modulation features are combined efficiently with MFCCs yielding modest and consistent improvements in multichannel distant speech recognition tasks on reverberant and noisy environments, where recognition rates are far from human performance

arXiv.org e-Print Archive

Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition

Author: Deng Yan
He Liang
Johnson Michael T
Liu Jia
Zhang Wei-Qiang
Publication venue: e-Publications@Marquette
Publication date: 01/02/2011
Field of study

The shifted delta cepstrum (SDC) is a widely used feature extraction for language recognition (LRE). With a high context width due to incorporation of multiple frames, SDC outperforms traditional delta and acceleration feature vectors. However, it also introduces correlation into the concatenated feature vector, which increases redundancy and may degrade the performance of backend classifiers. In this paper, we first propose a time-frequency cepstral (TFC) feature vector, which is obtained by performing a temporal discrete cosine transform (DCT) on the cepstrum matrix and selecting the transformed elements in a zigzag scan order. Beyond this, we increase discriminability through a heteroscedastic linear discriminant analysis (HLDA) on the full cepstrum matrix. By utilizing block diagonal matrix constraints, the large HLDA problem is then reduced to several smaller HLDA problems, creating a block diagonal HLDA (BDHLDA) algorithm which has much lower computational complexity. The BDHLDA method is finally extended to the GMM domain, using the simpler TFC features during re-estimation to provide significantly improved computation speed. Experiments on NIST 2003 and 2007 LRE evaluation corpora show that TFC is more effective than SDC, and that the GMM-based BDHLDA results in lower equal error rate (EER) and minimum average cost (Cavg) than either TFC or SDC approaches

Speech Recognition Front End Without Information Loss

Author: Ager Matthew
Cvetkovic Zoran
Sollich Peter
Publication venue
Publication date: 30/03/2015
Field of study

Speech representation and modelling in high-dimensional spaces of acoustic waveforms, or a linear transformation thereof, is investigated with the aim of improving the robustness of automatic speech recognition to additive noise. The motivation behind this approach is twofold: (i) the information in acoustic waveforms that is usually removed in the process of extracting low-dimensional features might aid robust recognition by virtue of structured redundancy analogous to channel coding, (ii) linear feature domains allow for exact noise adaptation, as opposed to representations that involve non-linear processing which makes noise adaptation challenging. Thus, we develop a generative framework for phoneme modelling in high-dimensional linear feature domains, and use it in phoneme classification and recognition tasks. Results show that classification and recognition in this framework perform better than analogous PLP and MFCC classifiers below 18 dB SNR. A combination of the high-dimensional and MFCC features at the likelihood level performs uniformly better than either of the individual representations across all noise levels

arXiv.org e-Print Archive

Updating the silent speech challenge benchmark with deep learning

Author: Denby Bruce
Ji Yan
Liu Licheng
Liu Zhilei
Niu Zhibin
Wang Hongcui
Publication venue
Publication date: 20/09/2017
Field of study

The 2010 Silent Speech Challenge benchmark is updated with new results obtained in a Deep Learning strategy, using the same input features and decoding strategy as in the original article. A Word Error Rate of 6.4% is obtained, compared to the published value of 17.4%. Additional results comparing new auto-encoder-based features with the original features at reduced dimensionality, as well as decoding scenarios on two different language models, are also presented. The Silent Speech Challenge archive has been updated to contain both the original and the new auto-encoder features, in addition to the original raw data.Comment: 25 pages, 6 page

arXiv.org e-Print Archive

New Parallel Models for Face Recognition

Author: Heng Fui Liau
Kah Phooi Seng
Li-Minn Ang
Siew Wen Chin
Publication venue: 'IntechOpen'
Publication date: 01/12/2008
Field of study

Comparing phonemes and visemes with DNN-based lipreading

Author: Bear Helen L
Harvey Richard
Thangthai Kwanchiva
Publication venue
Publication date: 08/05/2018
Field of study

There is debate if phoneme or viseme units are the most effective for a lipreading system. Some studies use phoneme units even though phonemes describe unique short sounds; other studies tried to improve lipreading accuracy by focusing on visemes with varying results. We compare the performance of a lipreading system by modeling visual speech using either 13 viseme or 38 phoneme units. We report the accuracy of our system at both word and unit levels. The evaluation task is large vocabulary continuous speech using the TCD-TIMIT corpus. We complete our visual speech modeling via hybrid DNN-HMMs and our visual speech decoder is a Weighted Finite-State Transducer (WFST). We use DCT and Eigenlips as a representation of mouth ROI image. The phoneme lipreading system word accuracy outperforms the viseme based system word accuracy. However, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words

arXiv.org e-Print Archive

Optimization of distributions differences for classification

Author: Bonyadi Mohammad Reza
Reutens David C.
Tieng Quang M.
Publication venue
Publication date: 02/03/2017
Field of study

In this paper we introduce a new classification algorithm called Optimization of Distributions Differences (ODD). The algorithm aims to find a transformation from the feature space to a new space where the instances in the same class are as close as possible to one another while the gravity centers of these classes are as far as possible from one another. This aim is formulated as a multiobjective optimization problem that is solved by a hybrid of an evolutionary strategy and the Quasi-Newton method. The choice of the transformation function is flexible and could be any continuous space function. We experiment with a linear and a non-linear transformation in this paper. We show that the algorithm can outperform 6 other state-of-the-art classification methods, namely naive Bayes, support vector machines, linear discriminant analysis, multi-layer perceptrons, decision trees, and k-nearest neighbors, in 12 standard classification datasets. Our results show that the method is less sensitive to the imbalanced number of instances comparing to these methods. We also show that ODD maintains its performance better than other classification methods in these datasets, hence, offers a better generalization ability

arXiv.org e-Print Archive

University of Queensland eSpace