Search CORE

134 research outputs found

An efficient implementation of lattice-ladder multilayer perceptrons in field programmable gate arrays

Author: Sledevič Tomyslav
Publication venue
Publication date: 05/05/2016
Field of study

The implementation efficiency of electronic systems is a combination of conflicting requirements, as increasing volumes of computations, accelerating the exchange of data, at the same time increasing energy consumption forcing the researchers not only to optimize the algorithm, but also to quickly implement in a specialized hardware. Therefore in this work, the problem of efficient and straightforward implementation of operating in a real-time electronic intelligent systems on field-programmable gate array (FPGA) is tackled. The object of research is specialized FPGA intellectual property (IP) cores that operate in a real-time. In the thesis the following main aspects of the research object are investigated: implementation criteria and techniques. The aim of the thesis is to optimize the FPGA implementation process of selected class dynamic artificial neural networks. In order to solve stated problem and reach the goal following main tasks of the thesis are formulated: rationalize the selection of a class of Lattice-Ladder Multi-Layer Perceptron (LLMLP) and its electronic intelligent system test-bed – a speaker dependent Lithuanian speech recognizer, to be created and investigated; develop dedicated technique for implementation of LLMLP class on FPGA that is based on specialized efficiency criteria for a circuitry synthesis; develop and experimentally affirm the efficiency of optimized FPGA IP cores used in Lithuanian speech recognizer. The dissertation contains: introduction, four chapters and general conclusions. The first chapter reveals the fundamental knowledge on computer-aideddesign, artificial neural networks and speech recognition implementation on FPGA. In the second chapter the efficiency criteria and technique of LLMLP IP cores implementation are proposed in order to make multi-objective optimization of throughput, LLMLP complexity and resource utilization. The data flow graphs are applied for optimization of LLMLP computations. The optimized neuron processing element is proposed. The IP cores for features extraction and comparison are developed for Lithuanian speech recognizer and analyzed in third chapter. The fourth chapter is devoted for experimental verification of developed numerous LLMLP IP cores. The experiments of isolated word recognition accuracy and speed for different speakers, signal to noise ratios, features extraction and accelerated comparison methods were performed. The main results of the thesis were published in 12 scientific publications: eight of them were printed in peer-reviewed scientific journals, four of them in a Thomson Reuters Web of Science database, four articles – in conference proceedings. The results were presented in 17 scientific conferences

Vilniaus Gedimino Technikos Universitetas: VGTU Talpykla / Vilnius Gediminas Technical University: VGTU Repository

Algorithm of Abnormal Audio Recognition Based on Improved MFCC

Author: Cao Xiaoli
He Lingling
Xie Chuan
Publication venue: Published by Elsevier Ltd.
Publication date: 31/12/2012
Field of study

AbstractCharacteristics extraction has a great effect on the audio training and recognition in the audio recognition system. MFCC algorithm is a typical characteristics extraction method with stable performance and high recognition rate. For the situation that MFCC has a large amount of computation, an improved algorithm MFCC_E is introduced. The computation of MFCC_E is reduced by 50% compared with the standard algorithm MFCC, and it make the hardware implementation is easy. The experimental result indicated that MFCC_E and MFCC have the same recognition rate roughly, yet the computational complexity of MFCC_E is much smaller

Real time speaker recognition using MFCC and VQ

Author: G Arun Rajsekhar
Publication venue
Publication date: 01/01/2008
Field of study

Speaker Recognition is a process of automatically recognizing who is speaking on the basis of the individual information included in speech waves. Speaker Recognition is one of the most useful biometric recognition techniques in this world where insecurity is a major threat. Many organizations like banks, institutions, industries etc are currently using this technology for providing greater security to their vast databases.Speaker Recognition mainly involves two modules namely feature extraction and feature matching. Feature extraction is the process that extracts a small amount of data from the speaker’s voice signal that can later be used to represent that speaker. Feature matching involves the actual procedure to identify the unknown speaker by comparing the extracted features from his/her voice input with the ones that are already stored in our speech database.In feature extraction we find the Mel Frequency Cepstrum Coefficients, which are based on the known variation of the human ear’s critical bandwidths with frequency and these, are vector quantized using LBG algorithm resulting in the speaker specific codebook. In feature matching we find the VQ distortion between the input utterance of an unknown speaker and the codebooks stored in our database. Based on this VQ distortion we decide whether to accept/reject the unknown speaker’s identity. The system I implemented in my work is 80% accurate in recognizing the correct speaker.In second phase we implement on the acoustic of Real Time speaker ecognition using mfcc and vq on a TMS320C6713 DSP board. We analyze the workload and identify the most timeconsuming operations

ethesis@nitr

Efficient audio signal processing for embedded systems

Author: Chiu Leung Kin
Publication venue: Georgia Institute of Technology
Publication date: 21/05/2012
Field of study

We investigated two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound "richer" and "fuller," using a combination of bass extension and dynamic range compression. We also developed an audio energy reduction algorithm for loudspeaker power management by suppressing signal energy below the masking threshold. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. We also designed the circuits to implement the AdaBoost-based analog classifier.PhDCommittee Chair: Anderson, David; Committee Member: Hasler, Jennifer; Committee Member: Hunt, William; Committee Member: Lanterman, Aaron; Committee Member: Minch, Bradle

A simple statistical speech recognition of mandarin monosyllables

Author: Chung-Bow Lee
Shui-Ching Chang
Tze Fen Li
Publication venue
Publication date: 24/04/2020
Field of study

Abstract Each mandarin syllable is represented by a sequence of vectors of linear predict coding cepstra (LPCC). Since all syllables have a simple phonetic structure, in our speech recognition, we partition the sequence of LPCC vectors of all syllables into equal segments and average the LPCC vectors in each segment. The mean vector of LPCC is used as the feature of a syllable. Our simple feature does not need any time consuming and complicated nonlinear contraction and expansion as adopted by the dynamic time-warping. We propose several probability distributions for the feature values. A simplified Bayes decision rule is used for classification of mandarin syllables. For the speaker-independent mandarin digits, the recognition rate is 98.6% if a normal distribution is used for feature values and the rate is 98.1% if an exponential distribution is used for the absolute values of the features. The feature proposed in this paper to represent a syllable is the simplest one, much easier to be extracted than any other known features. The computation for feature extraction and classification is much faster and more accurate than using the HMM method or any other known techniques

CiteSeerX

System-on-Chip Design for Audio Processing

Author: Bhushan Ravi Kant
Publication venue
Publication date: 02/06/2015
Field of study

Nowadays System-on-Chip (SoC) is present in every electronic system. SoC popularity is based on higher performance, reduced size, less power consumption, and alleviation of time to market by design reuse. Device scaling enabled SoC to integrate more functionality into a single chip and hence system complexity, like Audio Processing system, is no more barriers for the SoC designer. Speaker recognition/verification is one of the applications in biometrics for preventing identity fraud. It is suitable for real time scenarios and remote recognition over phone. In this project, I have designed a SoC system for Audio Processing on Altera DE2 board, FPGA platform, which automatically verify or recognize the speaker Identity. Mel Frequency Capestral Coefficient (MFCC) is used for feature extraction of the voice signal. Large samples of extracted feature are used to train the system by using Backpropagation Neural Network. After training, speaker verification done in real time by first extracting speaker voice feature, applying trained network on extracted feature, and comparing it with the stored database. Experimental result shows that the designed system is able to verify person’s identity

ethesis@nitr

SPEECH RECOGNITION FOR CONNECTED WORD USING CEPSTRAL AND DYNAMIC TIME WARPING ALGORITHMS

Author: MUDA LINDASALWA
Publication venue
Publication date: 01/09/2014
Field of study

Speech Recognition or Speech Recognizer (SR) has become an important tool for people with physical disabilities when handling Home Automation (HA) appliances. This technology is expected to improve the daily life of the elderly and the disabled so that they are always in control over their lives, and continue to live independently, to learn and stay involved in social life. The goal of the research is to solve the constraints of current Malay SR that is still in its infancy stage where there is limited research in Malay words, especially for HA applications. Since, most of the previous works were confined to wired microphone; this limitation of using wireless microphone type makes it an important area of the research. Research was carried out to develop SR word model for five (5) Malay words and five (5) English words as commands to activate and deactivate home appliances

UTPedia

Development of a sensory substitution API

Author: Martinez Marco
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2018
Field of study

2018 Summer.Includes bibliographical references.Sensory substitution – or the practice of mapping information from one sensory modality to another – has been shown to be a viable technique for non-invasive sensory replacement and augmentation. With the rise in popularity, ubiquity, and capability of mobile devices and wearable electronics, sensory substitution research has seen a resurgence in recent years. Due to the standard features of mobile/wearable electronics such as Bluetooth, multicore processing, and audio recording, these devices can be used to drive sensory substitution systems. Therefore, there exists a need for a flexible, extensible software package capable of performing the required real-time data processing for sensory substitution, on modern mobile devices. The primary contribution of this thesis is the development and release of an Open Source Application Programming Interface (API) capable of managing an audio stream from the source of sound to a sensory stimulus interface on the body. The API (named Tactile Waves) is written in the Java programming language and packaged as both a Java library (JAR) and Android library (AAR). The development and design of the library is presented, and its primary functions are explained. Implementation details for each primary function are discussed. Performance evaluation of all processing routines is performed to ensure real-time capability, and the results are summarized. Finally, future improvements to the library and additional applications of sensory substitution are proposed

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Secure Speech Biometric Templates

Author: Inthavisas Keerati -
Publication venue: Lehigh Preserve
Publication date
Field of study

Lehigh University: Lehigh Preserve

Computation of the one-dimensional unwrapped phase

Author: Karam Zahi Nadim
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2006
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 101-102). "Cepstrum bibliography" (p. 67-100).In this thesis, the computation of the unwrapped phase of the discrete-time Fourier transform (DTFT) of a one-dimensional finite-length signal is explored. The phase of the DTFT is not unique, and may contain integer multiple of 27r discontinuities. The unwrapped phase is the instance of the phase function chosen to ensure continuity. This thesis presents existing algorithms for computing the unwrapped phase, discussing their weaknesses and strengths. Then two composite algorithms are proposed that use the existing ones, combining their strengths while avoiding their weaknesses. The core of the proposed methods is based on recent advances in polynomial factoring. The proposed methods are implemented and compared to the existing ones.by Zahi Nadim Karam.S.M