1,058 research outputs found

    An experimental DSP-based tactile hearing aid : a feasibility study

    Get PDF

    FPGA Implementation of Spectral Subtraction for In-Car Speech Enhancement and Recognition

    Get PDF
    The use of speech recognition in noisy environments requires the use of speech enhancement algorithms in order to improve recognition performance. Deploying these enhancement techniques requires significant engineering to ensure algorithms are realisable in electronic hardware. This paper describes the design decisions and process to port the popular spectral subtraction algorithm to a Virtex-4 field-programmable gate array (FPGA) device. Resource analysis shows the final design uses only 13% of the total available FPGA resources. Waveforms and spectrograms presented support the validity of the proposed FPGA design

    An efficient implementation of lattice-ladder multilayer perceptrons in field programmable gate arrays

    Get PDF
    The implementation efficiency of electronic systems is a combination of conflicting requirements, as increasing volumes of computations, accelerating the exchange of data, at the same time increasing energy consumption forcing the researchers not only to optimize the algorithm, but also to quickly implement in a specialized hardware. Therefore in this work, the problem of efficient and straightforward implementation of operating in a real-time electronic intelligent systems on field-programmable gate array (FPGA) is tackled. The object of research is specialized FPGA intellectual property (IP) cores that operate in a real-time. In the thesis the following main aspects of the research object are investigated: implementation criteria and techniques. The aim of the thesis is to optimize the FPGA implementation process of selected class dynamic artificial neural networks. In order to solve stated problem and reach the goal following main tasks of the thesis are formulated: rationalize the selection of a class of Lattice-Ladder Multi-Layer Perceptron (LLMLP) and its electronic intelligent system test-bed – a speaker dependent Lithuanian speech recognizer, to be created and investigated; develop dedicated technique for implementation of LLMLP class on FPGA that is based on specialized efficiency criteria for a circuitry synthesis; develop and experimentally affirm the efficiency of optimized FPGA IP cores used in Lithuanian speech recognizer. The dissertation contains: introduction, four chapters and general conclusions. The first chapter reveals the fundamental knowledge on computer-aideddesign, artificial neural networks and speech recognition implementation on FPGA. In the second chapter the efficiency criteria and technique of LLMLP IP cores implementation are proposed in order to make multi-objective optimization of throughput, LLMLP complexity and resource utilization. The data flow graphs are applied for optimization of LLMLP computations. The optimized neuron processing element is proposed. The IP cores for features extraction and comparison are developed for Lithuanian speech recognizer and analyzed in third chapter. The fourth chapter is devoted for experimental verification of developed numerous LLMLP IP cores. The experiments of isolated word recognition accuracy and speed for different speakers, signal to noise ratios, features extraction and accelerated comparison methods were performed. The main results of the thesis were published in 12 scientific publications: eight of them were printed in peer-reviewed scientific journals, four of them in a Thomson Reuters Web of Science database, four articles – in conference proceedings. The results were presented in 17 scientific conferences

    Deep Spiking Neural Network model for time-variant signals classification: a real-time speech recognition approach

    Get PDF
    Speech recognition has become an important task to improve the human-machine interface. Taking into account the limitations of current automatic speech recognition systems, like non-real time cloud-based solutions or power demand, recent interest for neural networks and bio-inspired systems has motivated the implementation of new techniques. Among them, a combination of spiking neural networks and neuromorphic auditory sensors offer an alternative to carry out the human-like speech processing task. In this approach, a spiking convolutional neural network model was implemented, in which the weights of connections were calculated by training a convolutional neural network with specific activation functions, using firing rate-based static images with the spiking information obtained from a neuromorphic cochlea. The system was trained and tested with a large dataset that contains ”left” and ”right” speech commands, achieving 89.90% accuracy. A novel spiking neural network model has been proposed to adapt the network that has been trained with static images to a non-static processing approach, making it possible to classify audio signals and time series in real time.Ministerio de Economía y Competitividad TEC2016-77785-

    Learning and Production of Movement Sequences: Behavioral, Neurophysiological, and Modeling Perspectives

    Full text link
    A growing wave of behavioral studies, using a wide variety of paradigms that were introduced or greatly refined in recent years, has generated a new wealth of parametric observations about serial order behavior. What was a mere trickle of neurophysiological studies has grown to a more steady stream of probes of neural sites and mechanisms underlying sequential behavior. Moreover, simulation models of serial behavior generation have begun to open a channel to link cellular dynamics with cognitive and behavioral dynamics. Here we summarize the major results from prominent sequence learning and performance tasks, namely immediate serial recall, typing, 2XN, discrete sequence production, and serial reaction time. These populate a continuum from higher to lower degrees of internal control of sequential organization. The main movement classes covered are speech and keypressing, both involving small amplitude movements that are very amenable to parametric study. A brief synopsis of classes of serial order models, vis-à-vis the detailing of major effects found in the behavioral data, leads to a focus on competitive queuing (CQ) models. Recently, the many behavioral predictive successes of CQ models have been joined by successful prediction of distinctively patterend electrophysiological recordings in prefrontal cortex, wherein parallel activation dynamics of multiple neural ensembles strikingly matches the parallel dynamics predicted by CQ theory. An extended CQ simulation model-the N-STREAMS neural network model-is then examined to highlight issues in ongoing attemptes to accomodate a broader range of behavioral and neurophysiological data within a CQ-consistent theory. Important contemporary issues such as the nature of working memory representations for sequential behavior, and the development and role of chunks in hierarchial control are prominent throughout.Defense Advanced Research Projects Agency/Office of Naval Research (N00014-95-1-0409); National Institute of Mental Health (R01 DC02852

    Perceptual adaptation by normally hearing listeners to a simulated "hole" in hearing

    Get PDF
    Simulations of cochlear implants have demonstrated that the deleterious effects of a frequency misalignment between analysis bands and characteristic frequencies at basally shifted simulated electrode locations are significantly reduced with training. However, a distortion of frequency-to-place mapping may also arise due to a region of dysfunctional neurons that creates a "hole" in the tonotopic representation. This study simulated a 10 mm hole in the mid-frequency region. Noise-band processors were created with six output bands (three apical and three basal to the hole). The spectral information that would have been represented in the hole was either dropped or reassigned to bands on either side. Such reassignment preserves information but warps the place code, which may in itself impair performance. Normally hearing subjects received three hours of training in two reassignment conditions. Speech recognition improved considerably with training. Scores were much lower in a baseline (untrained) condition where information from the hole region was dropped. A second group of subjects trained in this dropped condition did show some improvement; however, scores after training were significantly lower than in the reassignment conditions. These results are consistent with the view that speech processors should present the most informative frequency range irrespective of frequency misalignment. 0 2006 Acoustical Society of America

    Distant Speech Recognition Using Multiple Microphones in Noisy and Reverberant Environments

    Get PDF
    corecore