865 research outputs found

    Local Decorrelation For Improved Detection

    Full text link
    Even with the advent of more sophisticated, data-hungry methods, boosted decision trees remain extraordinarily successful for fast rigid object detection, achieving top accuracy on numerous datasets. While effective, most boosted detectors use decision trees with orthogonal (single feature) splits, and the topology of the resulting decision boundary may not be well matched to the natural topology of the data. Given highly correlated data, decision trees with oblique (multiple feature) splits can be effective. Use of oblique splits, however, comes at considerable computational expense. Inspired by recent work on discriminative decorrelation of HOG features, we instead propose an efficient feature transform that removes correlations in local neighborhoods. The result is an overcomplete but locally decorrelated representation ideally suited for use with orthogonal decision trees. In fact, orthogonal trees with our locally decorrelated features outperform oblique trees trained over the original features at a fraction of the computational cost. The overall improvement in accuracy is dramatic: on the Caltech Pedestrian Dataset, we reduce false positives nearly tenfold over the previous state-of-the-art.Comment: To appear in Neural Information Processing Systems (NIPS), 201

    Machine Learning Techniques and Applications For Ground-based Image Analysis

    Full text link
    Ground-based whole sky cameras have opened up new opportunities for monitoring the earth's atmosphere. These cameras are an important complement to satellite images by providing geoscientists with cheaper, faster, and more localized data. The images captured by whole sky imagers can have high spatial and temporal resolution, which is an important pre-requisite for applications such as solar energy modeling, cloud attenuation analysis, local weather prediction, etc. Extracting valuable information from the huge amount of image data by detecting and analyzing the various entities in these images is challenging. However, powerful machine learning techniques have become available to aid with the image analysis. This article provides a detailed walk-through of recent developments in these techniques and their applications in ground-based imaging. We aim to bridge the gap between computer vision and remote sensing with the help of illustrative examples. We demonstrate the advantages of using machine learning techniques in ground-based image analysis via three primary applications -- segmentation, classification, and denoising

    A Face Recognition approach based on entropy estimate of the nonlinear DCT features in the Logarithm Domain together with Kernel Entropy Component Analysis

    Full text link
    This paper exploits the feature extraction capabilities of the discrete cosine transform (DCT) together with an illumination normalization approach in the logarithm domain that increase its robustness to variations in facial geometry and illumination. Secondly in the same domain the entropy measures are applied on the DCT coefficients so that maximum entropy preserving pixels can be extracted as the feature vector. Thus the informative features of a face can be extracted in a low dimensional space. Finally, the kernel entropy component analysis (KECA) with an extension of arc cosine kernels is applied on the extracted DCT coefficients that contribute most to the entropy estimate to obtain only those real kernel ECA eigenvectors that are associated with eigenvalues having high positive entropy contribution. The resulting system was successfully tested on real image sequences and is robust to significant partial occlusion and illumination changes, validated with the experiments on the FERET, AR, FRAV2D and ORL face databases. Experimental comparison is demonstrated to prove the superiority of the proposed approach in respect to recognition accuracy. Using specificity and sensitivity we find that the best is achieved when Renyi entropy is applied on the DCT coefficients. Extensive experimental comparison is demonstrated to prove the superiority of the proposed approach in respect to recognition accuracy. Moreover, the proposed approach is very simple, computationally fast and can be implemented in any real-time face recognition system.Comment: 9 pages,Published Online August 2013 in MECS. International Journal of Information Technology and Computer Science, 2013. arXiv admin note: text overlap with arXiv:1112.3712 by other author

    Improved Frequency Modulation Features for Multichannel Distant Speech Recognition

    Full text link
    Frequency modulation features capture the fine structure of speech formants that constitute beneficial and supplementary to the traditional energy-based cepstral features. Improvements have been demonstrated mainly in GMM-HMM systems for small and large vocabulary tasks. Yet, they have limited applications in DNN-HMM systems and Distant Speech Recognition (DSR) tasks. Herein, we elaborate on their integration within state-of-the-art front-end schemes that include post-processing of MFCCs resulting in discriminant and speaker adapted features of large temporal contexts. We explore 1) multichannel demodulation schemes for multi-microphone setups, 2) richer descriptors of frequency modulations, and 3) feature transformation and combination via hierarchical deep networks. We present results for tandem and hybrid recognition with GMM and DNN acoustic models, respectively. The improved modulation features are combined efficiently with MFCCs yielding modest and consistent improvements in multichannel distant speech recognition tasks on reverberant and noisy environments, where recognition rates are far from human performance

    Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition

    Get PDF
    The shifted delta cepstrum (SDC) is a widely used feature extraction for language recognition (LRE). With a high context width due to incorporation of multiple frames, SDC outperforms traditional delta and acceleration feature vectors. However, it also introduces correlation into the concatenated feature vector, which increases redundancy and may degrade the performance of backend classifiers. In this paper, we first propose a time-frequency cepstral (TFC) feature vector, which is obtained by performing a temporal discrete cosine transform (DCT) on the cepstrum matrix and selecting the transformed elements in a zigzag scan order. Beyond this, we increase discriminability through a heteroscedastic linear discriminant analysis (HLDA) on the full cepstrum matrix. By utilizing block diagonal matrix constraints, the large HLDA problem is then reduced to several smaller HLDA problems, creating a block diagonal HLDA (BDHLDA) algorithm which has much lower computational complexity. The BDHLDA method is finally extended to the GMM domain, using the simpler TFC features during re-estimation to provide significantly improved computation speed. Experiments on NIST 2003 and 2007 LRE evaluation corpora show that TFC is more effective than SDC, and that the GMM-based BDHLDA results in lower equal error rate (EER) and minimum average cost (Cavg) than either TFC or SDC approaches

    Speech Recognition Front End Without Information Loss

    Full text link
    Speech representation and modelling in high-dimensional spaces of acoustic waveforms, or a linear transformation thereof, is investigated with the aim of improving the robustness of automatic speech recognition to additive noise. The motivation behind this approach is twofold: (i) the information in acoustic waveforms that is usually removed in the process of extracting low-dimensional features might aid robust recognition by virtue of structured redundancy analogous to channel coding, (ii) linear feature domains allow for exact noise adaptation, as opposed to representations that involve non-linear processing which makes noise adaptation challenging. Thus, we develop a generative framework for phoneme modelling in high-dimensional linear feature domains, and use it in phoneme classification and recognition tasks. Results show that classification and recognition in this framework perform better than analogous PLP and MFCC classifiers below 18 dB SNR. A combination of the high-dimensional and MFCC features at the likelihood level performs uniformly better than either of the individual representations across all noise levels

    Updating the silent speech challenge benchmark with deep learning

    Full text link
    The 2010 Silent Speech Challenge benchmark is updated with new results obtained in a Deep Learning strategy, using the same input features and decoding strategy as in the original article. A Word Error Rate of 6.4% is obtained, compared to the published value of 17.4%. Additional results comparing new auto-encoder-based features with the original features at reduced dimensionality, as well as decoding scenarios on two different language models, are also presented. The Silent Speech Challenge archive has been updated to contain both the original and the new auto-encoder features, in addition to the original raw data.Comment: 25 pages, 6 page

    New Parallel Models for Face Recognition

    Get PDF

    Comparing phonemes and visemes with DNN-based lipreading

    Full text link
    There is debate if phoneme or viseme units are the most effective for a lipreading system. Some studies use phoneme units even though phonemes describe unique short sounds; other studies tried to improve lipreading accuracy by focusing on visemes with varying results. We compare the performance of a lipreading system by modeling visual speech using either 13 viseme or 38 phoneme units. We report the accuracy of our system at both word and unit levels. The evaluation task is large vocabulary continuous speech using the TCD-TIMIT corpus. We complete our visual speech modeling via hybrid DNN-HMMs and our visual speech decoder is a Weighted Finite-State Transducer (WFST). We use DCT and Eigenlips as a representation of mouth ROI image. The phoneme lipreading system word accuracy outperforms the viseme based system word accuracy. However, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words

    Optimization of distributions differences for classification

    Full text link
    In this paper we introduce a new classification algorithm called Optimization of Distributions Differences (ODD). The algorithm aims to find a transformation from the feature space to a new space where the instances in the same class are as close as possible to one another while the gravity centers of these classes are as far as possible from one another. This aim is formulated as a multiobjective optimization problem that is solved by a hybrid of an evolutionary strategy and the Quasi-Newton method. The choice of the transformation function is flexible and could be any continuous space function. We experiment with a linear and a non-linear transformation in this paper. We show that the algorithm can outperform 6 other state-of-the-art classification methods, namely naive Bayes, support vector machines, linear discriminant analysis, multi-layer perceptrons, decision trees, and k-nearest neighbors, in 12 standard classification datasets. Our results show that the method is less sensitive to the imbalanced number of instances comparing to these methods. We also show that ODD maintains its performance better than other classification methods in these datasets, hence, offers a better generalization ability
    • …
    corecore