166 research outputs found

    Parallel Algorithms for Isolated and Connected Word Recognition

    Get PDF
    For years researchers have worked toward finding a way to allow people to talk to machines in the same manner a person communicates to another person. This verbal man to machine interface, called speech recognition, can be grouped into three types: isolated word recognition, connected word recognition, and continuous speech recognition. Isolated word recognizers recognize single words with distinctive pauses before and after them. Continuous speech recognizers recognize speech spoken as one person speaks to another, continuously without pauses. Connected word recognition is an extension of isolated word recognition which recognizes groups of words spoken continuously. A group of words must have distinctive pauses before and after it, and the number of words in a group is limited to some small value (typically less than six). If these types of recognition systems are to be successful in the real world, they must be speaker independent and support a large vocabulary. They also must be able to recognize the speech input accurately and in real time. Currently there is no system which can meet all of these criteria because a vast amount of computations are needed. This report examines the use of parallel processing to reduce the computation time for speech recognition. Two different types of parallel architectures are considered here, the Single Instruction stream - Multiple Data (S1MD) machine and the VLSI processor array. The SIMD machine is chosen for its flexibility, which makes it a good candidate for testing new speech recognition algorithms. The VLSI processor array is selected as being good for a dedicated recognition system because of its simple processors and fixed interconnections. This report involves designing SIMD systems and VLSI processor arrays for both isolated and connected word recognition systems. These architectures are evaluated and contrasted in terms of the number of processors needed, the interprocessor connections required, and the “power” each processor needs to achieve real time recognition. The results show that an SIMD machine using 100 processors, each with an MC68000 processor, can recognize isolated words in real time using a 20 KHz sampling rate and a 1,000 word vocabulary

    An efficient implementation of lattice-ladder multilayer perceptrons in field programmable gate arrays

    Get PDF
    The implementation efficiency of electronic systems is a combination of conflicting requirements, as increasing volumes of computations, accelerating the exchange of data, at the same time increasing energy consumption forcing the researchers not only to optimize the algorithm, but also to quickly implement in a specialized hardware. Therefore in this work, the problem of efficient and straightforward implementation of operating in a real-time electronic intelligent systems on field-programmable gate array (FPGA) is tackled. The object of research is specialized FPGA intellectual property (IP) cores that operate in a real-time. In the thesis the following main aspects of the research object are investigated: implementation criteria and techniques. The aim of the thesis is to optimize the FPGA implementation process of selected class dynamic artificial neural networks. In order to solve stated problem and reach the goal following main tasks of the thesis are formulated: rationalize the selection of a class of Lattice-Ladder Multi-Layer Perceptron (LLMLP) and its electronic intelligent system test-bed – a speaker dependent Lithuanian speech recognizer, to be created and investigated; develop dedicated technique for implementation of LLMLP class on FPGA that is based on specialized efficiency criteria for a circuitry synthesis; develop and experimentally affirm the efficiency of optimized FPGA IP cores used in Lithuanian speech recognizer. The dissertation contains: introduction, four chapters and general conclusions. The first chapter reveals the fundamental knowledge on computer-aideddesign, artificial neural networks and speech recognition implementation on FPGA. In the second chapter the efficiency criteria and technique of LLMLP IP cores implementation are proposed in order to make multi-objective optimization of throughput, LLMLP complexity and resource utilization. The data flow graphs are applied for optimization of LLMLP computations. The optimized neuron processing element is proposed. The IP cores for features extraction and comparison are developed for Lithuanian speech recognizer and analyzed in third chapter. The fourth chapter is devoted for experimental verification of developed numerous LLMLP IP cores. The experiments of isolated word recognition accuracy and speed for different speakers, signal to noise ratios, features extraction and accelerated comparison methods were performed. The main results of the thesis were published in 12 scientific publications: eight of them were printed in peer-reviewed scientific journals, four of them in a Thomson Reuters Web of Science database, four articles – in conference proceedings. The results were presented in 17 scientific conferences

    Realization and design of a pilot assist decision-making system based on speech recognition

    Full text link
    A system based on speech recognition is proposed for pilot assist decision-making. It is based on a HIL aircraft simulation platform and uses the microcontroller SPCE061A as the central processor to achieve better reliability and higher cost-effect performance. Technologies of LPCC (linear predictive cepstral coding) and DTW (Dynamic Time Warping) are applied for isolated-word speech recognition to gain a smaller amount of calculation and a better real-time performance. Besides, we adopt the PWM (Pulse Width Modulation) regulation technology to effectively regulate each control surface by speech, and thus to assist the pilot to make decisions. By trial and error, it is proved that we have a satisfactory accuracy rate of speech recognition and control effect. More importantly, our paper provides a creative idea for intelligent human-computer interaction and applications of speech recognition in the field of aviation control. Our system is also very easy to be extended and applied.Comment: 10 pages, 8 figure

    An optimization framework for fixed-point digital signal processing.

    Get PDF
    Lam Yuet Ming.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 80-86).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.1.1 --- Difficulties of fixed-point design --- p.1Chapter 1.1.2 --- Why still fixed-point? --- p.2Chapter 1.1.3 --- Difficulties of converting floating-point to fixed-point --- p.2Chapter 1.1.4 --- Why wordlength optimization? --- p.3Chapter 1.2 --- Objectives --- p.3Chapter 1.3 --- Contributions --- p.3Chapter 1.4 --- Thesis Organization --- p.4Chapter 2 --- Review --- p.5Chapter 2.1 --- Introduction --- p.5Chapter 2.2 --- Simulation approach to address quantization issue --- p.6Chapter 2.3 --- Analytical approach to address quantization issue --- p.8Chapter 2.4 --- Implementation of speech systems --- p.9Chapter 2.5 --- Discussion --- p.10Chapter 2.6 --- Summary --- p.11Chapter 3 --- Fixed-point arithmetic background --- p.12Chapter 3.1 --- Introduction --- p.12Chapter 3.2 --- Fixed-point representation --- p.12Chapter 3.3 --- Fixed-point addition/subtraction --- p.14Chapter 3.4 --- Fixed-point multiplication --- p.16Chapter 3.5 --- Fixed-point division --- p.18Chapter 3.6 --- Summary --- p.20Chapter 4 --- Fixed-point class implementation --- p.21Chapter 4.1 --- Introduction --- p.21Chapter 4.2 --- Fixed-point simulation using overloading --- p.21Chapter 4.3 --- Fixed-point class implementation --- p.24Chapter 4.3.1 --- Fixed-point object declaration --- p.24Chapter 4.3.2 --- Overload the operators --- p.25Chapter 4.3.3 --- Arithmetic operations --- p.26Chapter 4.3.4 --- Automatic monitoring of dynamic range --- p.27Chapter 4.3.5 --- Automatic calculation of quantization error --- p.27Chapter 4.3.6 --- Array supporting --- p.28Chapter 4.3.7 --- Cosine calculation --- p.28Chapter 4.4 --- Summary --- p.29Chapter 5 --- Speech recognition background --- p.30Chapter 5.1 --- Introduction --- p.30Chapter 5.2 --- Isolated word recognition system overview --- p.30Chapter 5.3 --- Linear predictive coding processor --- p.32Chapter 5.3.1 --- The LPC model --- p.32Chapter 5.3.2 --- The LPC processor --- p.33Chapter 5.4 --- Vector quantization --- p.36Chapter 5.5 --- Hidden Markov model --- p.38Chapter 5.6 --- Summary --- p.40Chapter 6 --- Optimization --- p.41Chapter 6.1 --- Introduction --- p.41Chapter 6.2 --- Simplex Method --- p.41Chapter 6.2.1 --- Initialization --- p.42Chapter 6.2.2 --- Reflection --- p.42Chapter 6.2.3 --- Expansion --- p.44Chapter 6.2.4 --- Contraction --- p.44Chapter 6.2.5 --- Stop --- p.45Chapter 6.3 --- One-dimensional optimization approach --- p.45Chapter 6.3.1 --- One-dimensional optimization approach --- p.46Chapter 6.3.2 --- Search space reduction --- p.47Chapter 6.3.3 --- Speeding up convergence --- p.48Chapter 6.4 --- Summary --- p.50Chapter 7 --- Word Recognition System Design Methodology --- p.51Chapter 7.1 --- Introduction --- p.51Chapter 7.2 --- Framework design --- p.51Chapter 7.2.1 --- Fixed-point class --- p.52Chapter 7.2.2 --- Fixed-point application --- p.53Chapter 7.2.3 --- Optimizer --- p.53Chapter 7.3 --- Speech system implementation --- p.54Chapter 7.3.1 --- Model training --- p.54Chapter 7.3.2 --- Simulate the isolated word recognition system --- p.56Chapter 7.3.3 --- Hardware cost model --- p.57Chapter 7.3.4 --- Cost function --- p.58Chapter 7.3.5 --- Fraction size optimization --- p.59Chapter 7.3.6 --- One-dimensional optimization --- p.61Chapter 7.4 --- Summary --- p.63Chapter 8 --- Results --- p.64Chapter 8.1 --- Model training --- p.64Chapter 8.2 --- Simplex method optimization --- p.65Chapter 8.2.1 --- Simulation platform --- p.65Chapter 8.2.2 --- System level optimization --- p.66Chapter 8.2.3 --- LPC processor optimization --- p.67Chapter 8.2.4 --- One-dimensional optimization --- p.68Chapter 8.3 --- Speeding up the optimization convergence --- p.71Chapter 8.4 --- Optimization criteria --- p.73Chapter 8.5 --- Summary --- p.75Chapter 9 --- Conclusion --- p.76Chapter 9.1 --- Search space reduction --- p.76Chapter 9.2 --- Speeding up the searching --- p.77Chapter 9.3 --- Optimization criteria --- p.77Chapter 9.4 --- Flexibility of the framework design --- p.78Chapter 9.5 --- Further development --- p.78Bibliography --- p.8

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Efficient audio signal processing for embedded systems

    Get PDF
    We investigated two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound "richer" and "fuller," using a combination of bass extension and dynamic range compression. We also developed an audio energy reduction algorithm for loudspeaker power management by suppressing signal energy below the masking threshold. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. We also designed the circuits to implement the AdaBoost-based analog classifier.PhDCommittee Chair: Anderson, David; Committee Member: Hasler, Jennifer; Committee Member: Hunt, William; Committee Member: Lanterman, Aaron; Committee Member: Minch, Bradle

    An overview of artificial intelligence and robotics. Volume 1: Artificial intelligence. Part B: Applications

    Get PDF
    Artificial Intelligence (AI) is an emerging technology that has recently attracted considerable attention. Many applications are now under development. This report, Part B of a three part report on AI, presents overviews of the key application areas: Expert Systems, Computer Vision, Natural Language Processing, Speech Interfaces, and Problem Solving and Planning. The basic approaches to such systems, the state-of-the-art, existing systems and future trends and expectations are covered

    Evaluation of preprocessors for neural network speaker verification

    Get PDF
    corecore