438,695 research outputs found

    Deep Spiking Neural Network model for time-variant signals classification: a real-time speech recognition approach

    Get PDF
    Speech recognition has become an important task to improve the human-machine interface. Taking into account the limitations of current automatic speech recognition systems, like non-real time cloud-based solutions or power demand, recent interest for neural networks and bio-inspired systems has motivated the implementation of new techniques. Among them, a combination of spiking neural networks and neuromorphic auditory sensors offer an alternative to carry out the human-like speech processing task. In this approach, a spiking convolutional neural network model was implemented, in which the weights of connections were calculated by training a convolutional neural network with specific activation functions, using firing rate-based static images with the spiking information obtained from a neuromorphic cochlea. The system was trained and tested with a large dataset that contains ”left” and ”right” speech commands, achieving 89.90% accuracy. A novel spiking neural network model has been proposed to adapt the network that has been trained with static images to a non-static processing approach, making it possible to classify audio signals and time series in real time.Ministerio de Economía y Competitividad TEC2016-77785-

    Speech driven user interface for an intelligent house : a thesis presented in partial fulfilment of the requirements for the degree of Master of Engineering in Information Engineering at Massey University, Albany, New Zealand

    Get PDF
    Speech driven user interface for an intelligent house is one of a number of Graduate research projects at Massey University. It is part of Project 'Smart House'. This thesis details development of a control system whose inputs are speech signal rather than manual. The control system consists of several sub-systems including speech recognition, command generation, signal transmission, signal reception and command manipulation. The completed speech driven user interface should operate in conjunction with Real-time implementation of a Microphone Array beam-former and Personal identity recognition that were developed concurrently with this project. The speech recognition and command generation subsystems are implemented on a PC whereas the signal transmission, signal reception and command manipulation subsystems are designed at embedded board level. The remote controller can control some electrical appliances, such as TV and CD player, and switch and dim the light

    Real-Time Statistical Speech Translation

    Full text link
    This research investigates the Statistical Machine Translation approaches to translate speech in real time automatically. Such systems can be used in a pipeline with speech recognition and synthesis software in order to produce a real-time voice communication system between foreigners. We obtained three main data sets from spoken proceedings that represent three different types of human speech. TED, Europarl, and OPUS parallel text corpora were used as the basis for training of language models, for developmental tuning and testing of the translation system. We also conducted experiments involving part of speech tagging, compound splitting, linear language model interpolation, TrueCasing and morphosyntactic analysis. We evaluated the effects of variety of data preparations on the translation results using the BLEU, NIST, METEOR and TER metrics and tried to give answer which metric is most suitable for PL-EN language pair.Comment: machine translation, polish englis

    Convolutional Neural Networks for Speech Controlled Prosthetic Hands

    Full text link
    Speech recognition is one of the key topics in artificial intelligence, as it is one of the most common forms of communication in humans. Researchers have developed many speech-controlled prosthetic hands in the past decades, utilizing conventional speech recognition systems that use a combination of neural network and hidden Markov model. Recent advancements in general-purpose graphics processing units (GPGPUs) enable intelligent devices to run deep neural networks in real-time. Thus, state-of-the-art speech recognition systems have rapidly shifted from the paradigm of composite subsystems optimization to the paradigm of end-to-end optimization. However, a low-power embedded GPGPU cannot run these speech recognition systems in real-time. In this paper, we show the development of deep convolutional neural networks (CNN) for speech control of prosthetic hands that run in real-time on a NVIDIA Jetson TX2 developer kit. First, the device captures and converts speech into 2D features (like spectrogram). The CNN receives the 2D features and classifies the hand gestures. Finally, the hand gesture classes are sent to the prosthetic hand motion control system. The whole system is written in Python with Keras, a deep learning library that has a TensorFlow backend. Our experiments on the CNN demonstrate the 91% accuracy and 2ms running time of hand gestures (text output) from speech commands, which can be used to control the prosthetic hands in real-time.Comment: 2019 First International Conference on Transdisciplinary AI (TransAI), Laguna Hills, California, USA, 2019, pp. 35-4

    A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications

    Full text link
    Auditory models are commonly used as feature extractors for automatic speech-recognition systems or as front-ends for robotics, machine-hearing and hearing-aid applications. Although auditory models can capture the biophysical and nonlinear properties of human hearing in great detail, these biophysical models are computationally expensive and cannot be used in real-time applications. We present a hybrid approach where convolutional neural networks are combined with computational neuroscience to yield a real-time end-to-end model for human cochlear mechanics, including level-dependent filter tuning (CoNNear). The CoNNear model was trained on acoustic speech material and its performance and applicability were evaluated using (unseen) sound stimuli commonly employed in cochlear mechanics research. The CoNNear model accurately simulates human cochlear frequency selectivity and its dependence on sound intensity, an essential quality for robust speech intelligibility at negative speech-to-background-noise ratios. The CoNNear architecture is based on parallel and differentiable computations and has the power to achieve real-time human performance. These unique CoNNear features will enable the next generation of human-like machine-hearing applications

    Evolutionary Speech Recognition

    Get PDF
    Automatic speech recognition systems are becoming ever more common and are increasingly deployed in more variable acoustic conditions, by very different speakers. So these systems, generally conceived in a laboratory, must be robust in order to provide optimal performance in real situations. This article explores the possibility of gaining robustness by designing speech recognition systems able to auto-modify in real time, in order to adapt to the changes of acoustic environment. As a starting point, the adaptive capacities of living organisms were considered in relation to their environment. Analogues of these mechanisms were then applied to automatic speech recognition systems. It appeared to be interesting to imagine a system adapting to the changing acoustic conditions in order to remain effective regardless of its conditions of use

    Parallel Algorithms for Isolated and Connected Word Recognition

    Get PDF
    For years researchers have worked toward finding a way to allow people to talk to machines in the same manner a person communicates to another person. This verbal man to machine interface, called speech recognition, can be grouped into three types: isolated word recognition, connected word recognition, and continuous speech recognition. Isolated word recognizers recognize single words with distinctive pauses before and after them. Continuous speech recognizers recognize speech spoken as one person speaks to another, continuously without pauses. Connected word recognition is an extension of isolated word recognition which recognizes groups of words spoken continuously. A group of words must have distinctive pauses before and after it, and the number of words in a group is limited to some small value (typically less than six). If these types of recognition systems are to be successful in the real world, they must be speaker independent and support a large vocabulary. They also must be able to recognize the speech input accurately and in real time. Currently there is no system which can meet all of these criteria because a vast amount of computations are needed. This report examines the use of parallel processing to reduce the computation time for speech recognition. Two different types of parallel architectures are considered here, the Single Instruction stream - Multiple Data (S1MD) machine and the VLSI processor array. The SIMD machine is chosen for its flexibility, which makes it a good candidate for testing new speech recognition algorithms. The VLSI processor array is selected as being good for a dedicated recognition system because of its simple processors and fixed interconnections. This report involves designing SIMD systems and VLSI processor arrays for both isolated and connected word recognition systems. These architectures are evaluated and contrasted in terms of the number of processors needed, the interprocessor connections required, and the “power” each processor needs to achieve real time recognition. The results show that an SIMD machine using 100 processors, each with an MC68000 processor, can recognize isolated words in real time using a 20 KHz sampling rate and a 1,000 word vocabulary
    corecore