4,140 research outputs found

    Robust ASR using Support Vector Machines

    Get PDF
    The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units. In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad

    Optoelectronic Reservoir Computing

    Get PDF
    Reservoir computing is a recently introduced, highly efficient bio-inspired approach for processing time dependent data. The basic scheme of reservoir computing consists of a non linear recurrent dynamical system coupled to a single input layer and a single output layer. Within these constraints many implementations are possible. Here we report an opto-electronic implementation of reservoir computing based on a recently proposed architecture consisting of a single non linear node and a delay line. Our implementation is sufficiently fast for real time information processing. We illustrate its performance on tasks of practical importance such as nonlinear channel equalization and speech recognition, and obtain results comparable to state of the art digital implementations.Comment: Contains main paper and two Supplementary Material

    Reservoir computing using a delayed feedback system: towards photonic implementations

    Get PDF
    Delayed feedback systems are known to exhibit a rich dynamical behavior, showing a wide variety of dynamical regimes. We use this richness to implement reservoir computing, a processing concept in machine learning. In this paper we demonstrate the proof of principle on an electronic system, however the approach is readily transferable to photonics, promising fast and computationally efficient all-optical processing. Using only one single node with delayed feedback instead of an entire network of nodes, we succeed in obtaining state-of-the-art results on benchmarks such as spoken digit recognition and system identification

    A hierarchy of recurrent networks for speech recognition

    Get PDF
    Generative models for sequential data based on directed graphs of Restricted Boltzmann Machines (RBMs) are able to accurately model high dimensional sequences as recently shown. In these models, temporal dependencies in the input are discovered by either buffering previous visible variables or by recurrent connections of the hidden variables. Here we propose a modification of these models, the Temporal Reservoir Machine (TRM). It utilizes a recurrent artificial neural network (ANN) for integrating information from the input over time. This information is then fed into a RBM at each time step. To avoid difficulties of recurrent network learning, the ANN remains untrained and hence can be thought of as a random feature extractor. Using the architecture of multi-layer RBMs (Deep Belief Networks), the TRMs can be used as a building block for complex hierarchical models. This approach unifies RBM-based approaches for sequential data modeling and the Echo State Network, a powerful approach for black-box system identification. The TRM is tested on a spoken digits task under noisy conditions, and competitive performances compared to previous models are observed

    Applying Data Augmentation to Handwritten Arabic Numeral Recognition Using Deep Learning Neural Networks

    Full text link
    Handwritten character recognition has been the center of research and a benchmark problem in the sector of pattern recognition and artificial intelligence, and it continues to be a challenging research topic. Due to its enormous application many works have been done in this field focusing on different languages. Arabic, being a diversified language has a huge scope of research with potential challenges. A convolutional neural network model for recognizing handwritten numerals in Arabic language is proposed in this paper, where the dataset is subject to various augmentation in order to add robustness needed for deep learning approach. The proposed method is empowered by the presence of dropout regularization to do away with the problem of data overfitting. Moreover, suitable change is introduced in activation function to overcome the problem of vanishing gradient. With these modifications, the proposed system achieves an accuracy of 99.4\% which performs better than every previous work on the dataset.Comment: 5 pages, 6 figures, 3 table
    • …
    corecore