40,566 research outputs found

    Band-pass filtering of the time sequences of spectral parameters for robust wireless speech recognition

    Get PDF
    In this paper we address the problem of automatic speech recognition when wireless speech communication systems are involved. In this context, three main sources of distortion should be considered: acoustic environment, speech coding and transmission errors. Whilst the first one has already received a lot of attention, the last two deserve further investigation in our opinion. We have found out that band-pass filtering of the recognition features improves ASR performance when distortions due to these particular communication systems are present. Furthermore, we have evaluated two alternative configurations at different bit error rates (BER) typical of these channels: band-pass filtering the LP-MFCC parameters or a modification of the RASTA-PLP using a sharper low-pass section perform consistently better than LP-MFCC and RASTA-PLP, respectively.Publicad

    Robust Adaptive Median Binary Pattern for noisy texture classification and retrieval

    Full text link
    Texture is an important cue for different computer vision tasks and applications. Local Binary Pattern (LBP) is considered one of the best yet efficient texture descriptors. However, LBP has some notable limitations, mostly the sensitivity to noise. In this paper, we address these criteria by introducing a novel texture descriptor, Robust Adaptive Median Binary Pattern (RAMBP). RAMBP based on classification process of noisy pixels, adaptive analysis window, scale analysis and image regions median comparison. The proposed method handles images with high noisy textures, and increases the discriminative properties by capturing microstructure and macrostructure texture information. The proposed method has been evaluated on popular texture datasets for classification and retrieval tasks, and under different high noise conditions. Without any train or prior knowledge of noise type, RAMBP achieved the best classification compared to state-of-the-art techniques. It scored more than 90%90\% under 50%50\% impulse noise densities, more than 95%95\% under Gaussian noised textures with standard deviation σ=5\sigma = 5, and more than 99%99\% under Gaussian blurred textures with standard deviation σ=1.25\sigma = 1.25. The proposed method yielded competitive results and high performance as one of the best descriptors in noise-free texture classification. Furthermore, RAMBP showed also high performance for the problem of noisy texture retrieval providing high scores of recall and precision measures for textures with high levels of noise

    Learning An Invariant Speech Representation

    Get PDF
    Recognition of speech, and in particular the ability to generalize and learn from small sets of labelled examples like humans do, depends on an appropriate representation of the acoustic input. We formulate the problem of finding robust speech features for supervised learning with small sample complexity as a problem of learning representations of the signal that are maximally invariant to intraclass transformations and deformations. We propose an extension of a theory for unsupervised learning of invariant visual representations to the auditory domain and empirically evaluate its validity for voiced speech sound classification. Our version of the theory requires the memory-based, unsupervised storage of acoustic templates -- such as specific phones or words -- together with all the transformations of each that normally occur. A quasi-invariant representation for a speech segment can be obtained by projecting it to each template orbit, i.e., the set of transformed signals, and computing the associated one-dimensional empirical probability distributions. The computations can be performed by modules of filtering and pooling, and extended to hierarchical architectures. In this paper, we apply a single-layer, multicomponent representation for phonemes and demonstrate improved accuracy and decreased sample complexity for vowel classification compared to standard spectral, cepstral and perceptual features.Comment: CBMM Memo No. 022, 5 pages, 2 figure

    Focusing on the Big Picture: Insights into a Systems Approach to Deep Learning for Satellite Imagery

    Full text link
    Deep learning tasks are often complicated and require a variety of components working together efficiently to perform well. Due to the often large scale of these tasks, there is a necessity to iterate quickly in order to attempt a variety of methods and to find and fix bugs. While participating in IARPA's Functional Map of the World challenge, we identified challenges along the entire deep learning pipeline and found various solutions to these challenges. In this paper, we present the performance, engineering, and deep learning considerations with processing and modeling data, as well as underlying infrastructure considerations that support large-scale deep learning tasks. We also discuss insights and observations with regard to satellite imagery and deep learning for image classification.Comment: Accepted to IEEE Big Data 201
    • 

    corecore