Search CORE

1,058 research outputs found

Ultimate Trends in Integrated Systems to Enhance Automatic Speech Recognition Performance

Author: C. Dur&#225
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

An experimental DSP-based tactile hearing aid : a feasibility study

Author: Mathijssen R.W.M.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1991
Field of study

Repository TU/e

Pure OAI Repository

FPGA Implementation of Spectral Subtraction for In-Car Speech Enhancement and Recognition

Author: Deo Kapeel
Kleinschmidt Tristan
Mason Michael
Whittington Jim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

The use of speech recognition in noisy environments requires the use of speech enhancement algorithms in order to improve recognition performance. Deploying these enhancement techniques requires significant engineering to ensure algorithms are realisable in electronic hardware. This paper describes the design decisions and process to port the popular spectral subtraction algorithm to a Virtex-4 field-programmable gate array (FPGA) device. Resource analysis shows the final design uses only 13% of the total available FPGA resources. Waveforms and spectrograms presented support the validity of the proposed FPGA design

Queensland University of Technology ePrints Archive

An efficient implementation of lattice-ladder multilayer perceptrons in field programmable gate arrays

Author: Sledevič Tomyslav
Publication venue
Publication date: 05/05/2016
Field of study

The implementation efficiency of electronic systems is a combination of conflicting requirements, as increasing volumes of computations, accelerating the exchange of data, at the same time increasing energy consumption forcing the researchers not only to optimize the algorithm, but also to quickly implement in a specialized hardware. Therefore in this work, the problem of efficient and straightforward implementation of operating in a real-time electronic intelligent systems on field-programmable gate array (FPGA) is tackled. The object of research is specialized FPGA intellectual property (IP) cores that operate in a real-time. In the thesis the following main aspects of the research object are investigated: implementation criteria and techniques. The aim of the thesis is to optimize the FPGA implementation process of selected class dynamic artificial neural networks. In order to solve stated problem and reach the goal following main tasks of the thesis are formulated: rationalize the selection of a class of Lattice-Ladder Multi-Layer Perceptron (LLMLP) and its electronic intelligent system test-bed – a speaker dependent Lithuanian speech recognizer, to be created and investigated; develop dedicated technique for implementation of LLMLP class on FPGA that is based on specialized efficiency criteria for a circuitry synthesis; develop and experimentally affirm the efficiency of optimized FPGA IP cores used in Lithuanian speech recognizer. The dissertation contains: introduction, four chapters and general conclusions. The first chapter reveals the fundamental knowledge on computer-aideddesign, artificial neural networks and speech recognition implementation on FPGA. In the second chapter the efficiency criteria and technique of LLMLP IP cores implementation are proposed in order to make multi-objective optimization of throughput, LLMLP complexity and resource utilization. The data flow graphs are applied for optimization of LLMLP computations. The optimized neuron processing element is proposed. The IP cores for features extraction and comparison are developed for Lithuanian speech recognizer and analyzed in third chapter. The fourth chapter is devoted for experimental verification of developed numerous LLMLP IP cores. The experiments of isolated word recognition accuracy and speed for different speakers, signal to noise ratios, features extraction and accelerated comparison methods were performed. The main results of the thesis were published in 12 scientific publications: eight of them were printed in peer-reviewed scientific journals, four of them in a Thomson Reuters Web of Science database, four articles – in conference proceedings. The results were presented in 17 scientific conferences

Vilniaus Gedimino Technikos Universitetas: VGTU Talpykla / Vilnius Gediminas Technical University: VGTU Repository

Deep Spiking Neural Network model for time-variant signals classification: a real-time speech recognition approach

Author: Davidson Simón
Domínguez Morales Juan Pedro
Furber Steve B.
Gutiérrez Galán Daniel
James Robert
Jiménez Fernández Ángel Francisco
Liu Qian
Publication venue: IEEE Computer Society
Publication date: 01/01/2018
Field of study

Speech recognition has become an important task to improve the human-machine interface. Taking into account the limitations of current automatic speech recognition systems, like non-real time cloud-based solutions or power demand, recent interest for neural networks and bio-inspired systems has motivated the implementation of new techniques. Among them, a combination of spiking neural networks and neuromorphic auditory sensors offer an alternative to carry out the human-like speech processing task. In this approach, a spiking convolutional neural network model was implemented, in which the weights of connections were calculated by training a convolutional neural network with specific activation functions, using firing rate-based static images with the spiking information obtained from a neuromorphic cochlea. The system was trained and tested with a large dataset that contains ”left” and ”right” speech commands, achieving 89.90% accuracy. A novel spiking neural network model has been proposed to adapt the network that has been trained with static images to a non-static processing approach, making it possible to classify audio signals and time series in real time.Ministerio de Economía y Competitividad TEC2016-77785-

idUS. Depósito de Investigación Universidad de Sevilla

Learning and Production of Movement Sequences: Behavioral, Neurophysiological, and Modeling Perspectives

Author: Averbeck Bruno
Bullock Daniel
Page Michael
Rhodes Bradley
Verwey Willem
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/12/2003
Field of study

A growing wave of behavioral studies, using a wide variety of paradigms that were introduced or greatly refined in recent years, has generated a new wealth of parametric observations about serial order behavior. What was a mere trickle of neurophysiological studies has grown to a more steady stream of probes of neural sites and mechanisms underlying sequential behavior. Moreover, simulation models of serial behavior generation have begun to open a channel to link cellular dynamics with cognitive and behavioral dynamics. Here we summarize the major results from prominent sequence learning and performance tasks, namely immediate serial recall, typing, 2XN, discrete sequence production, and serial reaction time. These populate a continuum from higher to lower degrees of internal control of sequential organization. The main movement classes covered are speech and keypressing, both involving small amplitude movements that are very amenable to parametric study. A brief synopsis of classes of serial order models, vis-à-vis the detailing of major effects found in the behavioral data, leads to a focus on competitive queuing (CQ) models. Recently, the many behavioral predictive successes of CQ models have been joined by successful prediction of distinctively patterend electrophysiological recordings in prefrontal cortex, wherein parallel activation dynamics of multiple neural ensembles strikingly matches the parallel dynamics predicted by CQ theory. An extended CQ simulation model-the N-STREAMS neural network model-is then examined to highlight issues in ongoing attemptes to accomodate a broader range of behavioral and neurophysiological data within a CQ-consistent theory. Important contemporary issues such as the nature of working memory representations for sequential behavior, and the development and role of chunks in hierarchial control are prominent throughout.Defense Advanced Research Projects Agency/Office of Naval Research (N00014-95-1-0409); National Institute of Mental Health (R01 DC02852

Boston University Institutional Repository (OpenBU)

Perceptual adaptation by normally hearing listeners to a simulated "hole" in hearing

Author: Andrew Faulkner
Bench R. J.
Faulkner A.
Matthew W. Smith
Moore B. C. J.
Murray N.
Publication venue: ACOUSTICAL SOC AMER AMER INST PHYSICS
Publication date: 01/12/2006
Field of study

Simulations of cochlear implants have demonstrated that the deleterious effects of a frequency misalignment between analysis bands and characteristic frequencies at basally shifted simulated electrode locations are significantly reduced with training. However, a distortion of frequency-to-place mapping may also arise due to a region of dysfunctional neurons that creates a "hole" in the tonotopic representation. This study simulated a 10 mm hole in the mid-frequency region. Noise-band processors were created with six output bands (three apical and three basal to the hole). The spectral information that would have been represented in the hole was either dropped or reassigned to bands on either side. Such reassignment preserves information but warps the place code, which may in itself impair performance. Normally hearing subjects received three hours of training in two reassignment conditions. Speech recognition improved considerably with training. Scores were much lower in a baseline (untrained) condition where information from the hole region was dropped. A second group of subjects trained in this dropped condition did show some improvement; however, scores after training were significantly lower than in the reassignment conditions. These results are consistent with the view that speech processors should present the most informative frequency range irrespective of frequency misalignment. 0 2006 Acoustical Society of America

Crossref

UCL Discovery

Recommended from our members

Low-resource Multi-task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network Representations

Author: Bhattacharya S
Georgiev P
Lane N
Mascolo C
Publication venue: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT)
Publication date: 11/09/2017
Field of study

Continuous audio analysis from embedded and mobile devices is an increasingly important application domain. More and more, appliances like the Amazon Echo, along with smartphones and watches, and even research prototypes seek to perform multiple discriminative tasks simultaneously from ambient audio; for example, monitoring background sound classes (e.g., music or conversation), recognizing certain keywords (‘Hey Siri’ or ‘Alexa’), or identifying the user and her emotion from speech. The use of deep learning algorithms typically provides state-of-the-art model performances for such general audio tasks. However, the large computational demands of deep learning models are at odds with the limited processing, energy and memory resources of mobile, embedded and IoT devices. In this paper, we propose and evaluate a novel deep learning modeling and optimization framework that speci cally targets this category of embedded audio sensing tasks. Although the supported tasks are simpler than the task of speech recognition, this framework aims at maintaining accuracies in predictions while minimizing the overall processor resource footprint. The proposed model is grounded in multi-task learning principles to train shared deep layers and exploits, as input layer, only statistical summaries of audio lter banks to further lower computations. We nd that for embedded audio sensing tasks our framework is able to maintain similar accuracies, which are observed in comparable deep architectures that use single-task learning and typically more complex input layers. Most importantly, on an average, this approach provides almost a 2.1⇥ reduction in runtime, energy, and memory for four separate audio sensing tasks, assuming a variety of task combinations.Microsoft Researc

Apollo (Cambridge)

CUED - Cambridge University Engineering Department

Distant Speech Recognition Using Multiple Microphones in Noisy and Reverberant Environments

Author: Runer Hanna
Publication venue: Lunds universitet/Institutionen för elektro- och informationsteknik
Publication date: 01/01/2015
Field of study

Silicon Technologies for Speaker Independent Speech Processing and Recognition Systems in Noisy Environments

Author: Arun Selvaraj
Karthikeyan Natarajan
Mala John
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref