Search CORE

23,576 research outputs found

Does signal reduction imply predictive coding in models of spoken word recognition?

Author: Brodbeck Christian
Li Monica Y. C.
Luthra Sahil
Magnuson James S.
You Heejo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Published online: 14 April 2021Pervasive behavioral and neural evidence for predictive processing has led to claims that language processing depends upon predictive coding. Formally, predictive coding is a computational mechanism where only deviations from top-down expectations are passed between levels of representation. In many cognitive neuroscience studies, a reduction of signal for expected inputs is taken as being diagnostic of predictive coding. In the present work, we show that despite not explicitly implementing prediction, the TRACE model of speech perception exhibits this putative hallmark of predictive coding, with reductions in total lexical activation, total lexical feedback, and total phoneme activation when the input conforms to expectations. These findings may indicate that interactive activation is functionally equivalent or approximant to predictive coding or that caution is warranted in interpreting neural signal reduction as diagnostic of predictive coding.This researchwas supported by NSF 1754284, NSF IGERT 1144399, and NSF NRT 1747486 (PI: J.S.M.). This research was also supported in part by the Basque Government through the BERC 2018- 2021program, and by the Agencia Estatal de Investigación through BCBL Severo Ochoa excellenceaccreditation SEV-2015-0490. S.L. was supported by an NSF Graduate Research Fellowship

Archivo Digital para la Docencia y la Investigación

PubMed Central

Speech Recognition of Isolated Arabic words via using Wavelet Transformation and Fuzzy Neural Network

Author: Al-Irhayim Yusra Faisal
Hussein Maher Khalaf
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 27/03/2016
Field of study

In this paper two new methods for feature extraction are presented for speech recognition the first method use a combination of linear predictive coding technique(LPC) and skewness equation. The second one(WLPCC) use a combination of linear predictive coding technique(LPC), discrete wavelet transform(DWT), and cpestrum analysis. The objective of this method is to enhance the performance of the proposed method by introducing more features from the signal. Neural Network(NN) and Neuro-Fuzzy Network are used in the proposed methods for classification. Test result show that the WLPCC method in the process of features extraction, and the neuro fuzzy network in the classification process had highest recognition rate for both the trained and non trained data. The proposed system has been built using MATLAB software and the data involve ten isolated Arabic words that are (الله، محمد، خديجة، ياسين، يتكلم، الشارقة، لندن، يسار، يمين، أحزان), for fifteen male speakers. The recognition rate of trained data is (97.8%) and non-trained data is (81.1%). Keywords: Speech Recognition, Feature Extraction, Linear Predictive Coding (LPC),Neural Network, Fuzzy networ

International Institute for Science, Technology and Education (IISTE): E-Journals

Pengenalan Sinyal Ucapan Angka dengan Menggunakan Jaringan Syaraf Tiruan Long Short-Term Memory

Author: Achmad Rawangga Yogaswara
Publication venue: Universitas Telkom
Publication date: 01/01/2007
Field of study

ABSTRAKSI: Pada tugas akhir ini akan dibuat sebuah sistem pengenalan sinyal ucapan angka dengan menggunakan jaringan syaraf tiruan metode Long Short-Term Memory (LSTM) menggunakan personal komputer. Sinyal ucapan angka sudah ditetapkan sebelumnya dengan pola isolated digit. Untuk proses ekstraksi parameter suara menggunakan metode Linear Predictive Coding (LPC). Hasil LPC selanjutnya diproses dengan jaringan syaraf tiruan LSTM untuk melakukan pengenalan. Empat puluh (40) sampel suara dari para pembicara digunakan sebagai input pada proses pelatihan jaringan syaraf tiruan, dan empat puluh (40) sampel suara yang lain digunakan saat proses pengujian jaringan.Kata Kunci : pengenalan sinyal ucapan angka, jaringan syaraf tiruan, long shorttermABSTRACT: At this final project will be made a human speech number signal recognition system with Long Short-Term Memory (LSTM) neural networks method using personal computer. Speech number signal has been fixed before with use isolated digit pattern. For extraction process of voice parameter using Linear Predictive Coding (LPC) method. LPC’s output will be proceed with LSTM neural networks for doing recognition. Forty voices sample from speakers will be used as input at training process of neural networks and the other forty voices sample will be used at testing process.Keyword: speech number signal recognition, neural networks, long short-ter

Open Library

Speaker Independent Speech Recognition Using Neural Network

Author: Tan Chin Luh
Publication venue
Publication date: 01/12/2004
Field of study

In spite of the advances accomplished throughout the last few decades, automatic speech recognition (ASR) is still a challenging and difficult task when the systems are applied in the real world. Different requirements for various applications drive the researchers to explore for more effective ways in the particular application. Attempts to apply artificial neural networks (ANN) as a classification tool are proposed to increase the reliability of the system. This project studies the approach of using neural network for speaker independent isolated word recognition on small vocabularies and proposes a method to have a simple MLP as speech recognizer. Our approach is able to overcome the current limitations of MLP in the selection of input buffers’ size by proposing a method on frames selection. Linear predictive coding (LPC) has been applied to represent speech signal in frames in early stage. Features from the selected frames are used to train the multilayer perceptrons (MLP) feedforward back-propagation (FFBP) neural network during the training stage. Same routine has been applied to the speech signal during the recognition stage and the unknown test pattern will be classified to one of the nearest pattern. In short, the selected frames represent the local features of the speech signal and all of them contribute to the global similarity for the whole speech signal. The analysis, design and the PC based voice dialling system is developed using MATLAB®

Universiti Putra Malaysia Institutional Repository

Digit recognition using neural networks

Author: Jantan Adznan
Tan Chin Luh
Publication venue: Faculty of Computer Science and Information Technology, University of Malaya
Publication date: 01/01/2004
Field of study

This paper investigates the use of feed-forward multi-layer perceptrons trained by back-propagation in speech recognition. Besides this, the paper also proposes an automatic technique for both training and recognition. The use of neural networks for speaker independent isolated word recognition on small vocabularies is studied and an automated system from the training stage to the recognition stage without the need of manual cropping for speech signals is developed to evaluate the performance of the automatic speech recognition (ASR) system. Linear predictive coding (LPC) has been applied to represent speech signal in frames in early stage. Features from the selected frames are used to train multilayer perceptrons (MLP) using back-propagation. The same routine is applied to the speech signal during the recognition stage and unknown test patterns are classified to the nearest patterns. In short, the selected frames represent the local features of the speech signal and all of them contribute to the global similarity for the whole speech signal. The analysis, design and development of the automation system are done in MATLAB, in which an isolated word speaker independent digits recogniser is developed

Universiti Putra Malaysia Institutional Repository

A Novel Approach for Multilingual Speech Recognition with Back Propagation Artificial Neural Network

Author: Rajat Haldar, Dr. Pankaj Kumar Mishra
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/05/2016
Field of study

“Speech Recognition” of audio signal is important for telecommunication, language identification and speaker verification. Robust Speech Recognition can be applied to automation of houses, offices and telecommunication services. In this paper Speech Recognition & Language Identification have done for Bengali, Chhattisgarhi, English and Hindi speech signals. The Bengali, Chhattisgarhi, English, Hindi speech signals are “Ekhone Tumi Jao”, “Ae Bar Teha Ja”, “Now This Time You Go” and “Ab Is Bar tum Jao” respectively. This method is mainly applied in two phases, in the first phase Speech Recognition and Language identification have done with Back Propagation Artificial neural Network (BPANN) and in the second phase Speech Recognition and Language Identification have done with the combination of the Particle Swarm Optimization (PSO) feature selection technique and BPANN. For the feature extraction Mel Frequency Cepstral Coefficients (MFCC) & Linear Predictive Coding (LPC) is used. MFCC and LPC are the most widely used feature extraction method. BPANN is a feed forward type neural network, it can trace back the error signal for weight modification, error signal generates when the actual output value differs from the target output value. The system accuracy and performance is measured on the basis of “Recognition Rate” and amount of error. Multilingual Speech Recognition and Language Identification with PSO feature selection technique gives the better Recognition Rate as compare to the without PSO feature selection technique

International Journal on Recent and Innovation Trends in Computing and Communication

Wavenet based low rate speech coding

Author: Kleijn W. Bastiaan
Lim Felicia S. C.
Luebs Alejandro
Skoglund Jan
Stimberg Florian
Walters Thomas C.
Wang Quan
Publication venue
Publication date: 01/12/2017
Field of study

Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure

arXiv.org e-Print Archive

Crossref