13,778 research outputs found

    Multi-View Networks For Multi-Channel Audio Classification

    Full text link
    In this paper we introduce the idea of multi-view networks for sound classification with multiple sensors. We show how one can build a multi-channel sound recognition model trained on a fixed number of channels, and deploy it to scenarios with arbitrary (and potentially dynamically changing) number of input channels and not observe degradation in performance. We demonstrate that at inference time you can safely provide this model all available channels as it can ignore noisy information and leverage new information better than standard baseline approaches. The model is evaluated in both an anechoic environment and in rooms generated by a room acoustics simulator. We demonstrate that this model can generalize to unseen numbers of channels as well as unseen room geometries.Comment: 5 pages, 7 figures, Accepted to ICASSP 201

    Deep Spiking Neural Network model for time-variant signals classification: a real-time speech recognition approach

    Get PDF
    Speech recognition has become an important task to improve the human-machine interface. Taking into account the limitations of current automatic speech recognition systems, like non-real time cloud-based solutions or power demand, recent interest for neural networks and bio-inspired systems has motivated the implementation of new techniques. Among them, a combination of spiking neural networks and neuromorphic auditory sensors offer an alternative to carry out the human-like speech processing task. In this approach, a spiking convolutional neural network model was implemented, in which the weights of connections were calculated by training a convolutional neural network with specific activation functions, using firing rate-based static images with the spiking information obtained from a neuromorphic cochlea. The system was trained and tested with a large dataset that contains ”left” and ”right” speech commands, achieving 89.90% accuracy. A novel spiking neural network model has been proposed to adapt the network that has been trained with static images to a non-static processing approach, making it possible to classify audio signals and time series in real time.Ministerio de Economía y Competitividad TEC2016-77785-

    On the application of reservoir computing networks for noisy image recognition

    Get PDF
    Reservoir Computing Networks (RCNs) are a special type of single layer recurrent neural networks, in which the input and the recurrent connections are randomly generated and only the output weights are trained. Besides the ability to process temporal information, the key points of RCN are easy training and robustness against noise. Recently, we introduced a simple strategy to tune the parameters of RCNs. Evaluation in the domain of noise robust speech recognition proved that this method was effective. The aim of this work is to extend that study to the field of image processing, by showing that the proposed parameter tuning procedure is equally valid in the field of image processing and conforming that RCNs are apt at temporal modeling and are robust with respect to noise. In particular, we investigate the potential of RCNs in achieving competitive performance on the well-known MNIST dataset by following the aforementioned parameter optimizing strategy. Moreover, we achieve good noise robust recognition by utilizing such a network to denoise images and supplying them to a recognizer that is solely trained on clean images. The experiments demonstrate that the proposed RCN-based handwritten digit recognizer achieves an error rate of 0.81 percent on the clean test data of the MNIST benchmark and that the proposed RCN-based denoiser can effectively reduce the error rate on the various types of noise. (c) 2017 Elsevier B.V. All rights reserved
    corecore