52 research outputs found

    Parallel Algorithms for Isolated and Connected Word Recognition

    Get PDF
    For years researchers have worked toward finding a way to allow people to talk to machines in the same manner a person communicates to another person. This verbal man to machine interface, called speech recognition, can be grouped into three types: isolated word recognition, connected word recognition, and continuous speech recognition. Isolated word recognizers recognize single words with distinctive pauses before and after them. Continuous speech recognizers recognize speech spoken as one person speaks to another, continuously without pauses. Connected word recognition is an extension of isolated word recognition which recognizes groups of words spoken continuously. A group of words must have distinctive pauses before and after it, and the number of words in a group is limited to some small value (typically less than six). If these types of recognition systems are to be successful in the real world, they must be speaker independent and support a large vocabulary. They also must be able to recognize the speech input accurately and in real time. Currently there is no system which can meet all of these criteria because a vast amount of computations are needed. This report examines the use of parallel processing to reduce the computation time for speech recognition. Two different types of parallel architectures are considered here, the Single Instruction stream - Multiple Data (S1MD) machine and the VLSI processor array. The SIMD machine is chosen for its flexibility, which makes it a good candidate for testing new speech recognition algorithms. The VLSI processor array is selected as being good for a dedicated recognition system because of its simple processors and fixed interconnections. This report involves designing SIMD systems and VLSI processor arrays for both isolated and connected word recognition systems. These architectures are evaluated and contrasted in terms of the number of processors needed, the interprocessor connections required, and the “power” each processor needs to achieve real time recognition. The results show that an SIMD machine using 100 processors, each with an MC68000 processor, can recognize isolated words in real time using a 20 KHz sampling rate and a 1,000 word vocabulary

    An efficient implementation of lattice-ladder multilayer perceptrons in field programmable gate arrays

    Get PDF
    The implementation efficiency of electronic systems is a combination of conflicting requirements, as increasing volumes of computations, accelerating the exchange of data, at the same time increasing energy consumption forcing the researchers not only to optimize the algorithm, but also to quickly implement in a specialized hardware. Therefore in this work, the problem of efficient and straightforward implementation of operating in a real-time electronic intelligent systems on field-programmable gate array (FPGA) is tackled. The object of research is specialized FPGA intellectual property (IP) cores that operate in a real-time. In the thesis the following main aspects of the research object are investigated: implementation criteria and techniques. The aim of the thesis is to optimize the FPGA implementation process of selected class dynamic artificial neural networks. In order to solve stated problem and reach the goal following main tasks of the thesis are formulated: rationalize the selection of a class of Lattice-Ladder Multi-Layer Perceptron (LLMLP) and its electronic intelligent system test-bed – a speaker dependent Lithuanian speech recognizer, to be created and investigated; develop dedicated technique for implementation of LLMLP class on FPGA that is based on specialized efficiency criteria for a circuitry synthesis; develop and experimentally affirm the efficiency of optimized FPGA IP cores used in Lithuanian speech recognizer. The dissertation contains: introduction, four chapters and general conclusions. The first chapter reveals the fundamental knowledge on computer-aideddesign, artificial neural networks and speech recognition implementation on FPGA. In the second chapter the efficiency criteria and technique of LLMLP IP cores implementation are proposed in order to make multi-objective optimization of throughput, LLMLP complexity and resource utilization. The data flow graphs are applied for optimization of LLMLP computations. The optimized neuron processing element is proposed. The IP cores for features extraction and comparison are developed for Lithuanian speech recognizer and analyzed in third chapter. The fourth chapter is devoted for experimental verification of developed numerous LLMLP IP cores. The experiments of isolated word recognition accuracy and speed for different speakers, signal to noise ratios, features extraction and accelerated comparison methods were performed. The main results of the thesis were published in 12 scientific publications: eight of them were printed in peer-reviewed scientific journals, four of them in a Thomson Reuters Web of Science database, four articles – in conference proceedings. The results were presented in 17 scientific conferences

    Abusing Hardware Race Conditions for High Throughput Energy Efficient Computation

    Get PDF
    We propose a novel computing approach, called “Race Logic”, which utilizes a new data representation to accelerate a broad class of optimization problems, such as those solved by dynamic programming algorithms. The core idea of Race Logic is to deliberately engineer race conditions in a circuit to perform useful computation. In Race Logic, information, instead of being represented as logic levels (as is done in conventional logic), is represented as a timing delay. Computations can then be performed by observing the relative propagation times of signals injected into a configurable circuit (i.e. the outcome of races through the circuit).In this dissertation I will introduce Race Based computation and talk about multiple VLSI implementations. We first begin by considering a synchronous approach, which uses simple clocked delay elements. Though this synchronous implementation outperforms highly optimized conventional implementations of the well-studied, DNA sequence alignment problem, its third order energy scaling with problem size and limited dynamic range of timing delays are its major pitfalls. Next, in the search for energy efficiency, we study asynchronous designs in order to understand the performance trade-offs and applicability of this new architecture. Finally, I will present the results of a prototype asynchronous Race Logic chip and demonstrate that Race-Based computations can align up to 10 million 50 symbol long DNA sequences per second, about 2-3 orders of magnitude faster than the state of the art general purpose computing systems

    A 6 mW, 5,000-Word Real-Time Speech Recognizer Using WFST Models

    Get PDF
    We describe an IC that provides a local speech recognition capability for a variety of electronic devices. We start with a generic speech decoder architecture that is programmable with industry-standard WFST and GMM speech models. Algorithm and architectural enhancements are incorporated in order to achieve real-time performance amid system-level constraints on internal memory size and external memory bandwidth. A 2.5 × 2.5 mm test chip implementing this architecture was fabricated using a 65 nm process. The chip performs a 5,000 word recognition task in real-time with 13.0% word error rate, 6.0 mW core power consumption, and a search efficiency of approximately 16 nJ per hypothesis.Quanta Computer (Firm)Irwin Mark Jacobs and Joan Klein Jacobs Presidential Fellowshi

    Applications and implementation of neuro-connectionist architectures.

    Get PDF
    by H.S. Ng.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 91-97).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Introduction --- p.1Chapter 1.2 --- Neuro-connectionist Network --- p.2Chapter 2 --- Related Works --- p.5Chapter 2.1 --- Introduction --- p.5Chapter 2.1.1 --- Kruskal's Algorithm --- p.5Chapter 2.1.2 --- Prim's algorithm --- p.6Chapter 2.1.3 --- Sollin's algorithm --- p.7Chapter 2.1.4 --- Bellman-Ford algorithm --- p.8Chapter 2.1.5 --- Floyd-Warshall algorithm --- p.9Chapter 3 --- Binary Relation Inference Network and Path Problems --- p.11Chapter 3.1 --- Introduction --- p.11Chapter 3.2 --- Topology --- p.12Chapter 3.3 --- Network structure --- p.13Chapter 3.3.1 --- Single-destination BRIN architecture --- p.14Chapter 3.3.2 --- Comparison between all-pair BRIN and single-destination BRIN --- p.18Chapter 3.4 --- Path Problems and BRIN Solution --- p.18Chapter 3.4.1 --- Minimax path problems --- p.18Chapter 3.4.2 --- BRIN solution --- p.19Chapter 4 --- Analog and Voltage-mode Approach --- p.22Chapter 4.1 --- Introduction --- p.22Chapter 4.2 --- Analog implementation --- p.24Chapter 4.3 --- Voltage-mode approach --- p.26Chapter 4.3.1 --- The site function --- p.26Chapter 4.3.2 --- The unit function --- p.28Chapter 4.3.3 --- The computational unit --- p.28Chapter 4.4 --- Conclusion --- p.29Chapter 5 --- Current-mode Approach --- p.32Chapter 5.1 --- Introduction --- p.32Chapter 5.2 --- Current-mode approach for analog VLSI Implementation --- p.33Chapter 5.2.1 --- Site and Unit output function --- p.33Chapter 5.2.2 --- Computational unit --- p.34Chapter 5.2.3 --- A complete network --- p.35Chapter 5.3 --- Conclusion --- p.37Chapter 6 --- Neural Network Compensation for Optimization Circuit --- p.40Chapter 6.1 --- Introduction --- p.40Chapter 6.2 --- A Neuro-connectionist Architecture for error correction --- p.41Chapter 6.2.1 --- Linear Relationship --- p.42Chapter 6.2.2 --- Output Deviation of Computational Unit --- p.44Chapter 6.3 --- Experimental Results --- p.46Chapter 6.3.1 --- Training Phase --- p.46Chapter 6.3.2 --- Generalization Phase --- p.48Chapter 6.4 --- Conclusion --- p.50Chapter 7 --- Precision-limited Analog Neural Network Compensation --- p.51Chapter 7.1 --- Introduction --- p.51Chapter 7.2 --- Analog Neural Network hardware --- p.53Chapter 7.3 --- Integration of analog neural network compensation of connectionist net- work for general path problems --- p.54Chapter 7.4 --- Experimental Results --- p.55Chapter 7.4.1 --- Convergence time --- p.56Chapter 7.4.2 --- The accuracy of the system --- p.57Chapter 7.5 --- Conclusion --- p.58Chapter 8 --- Transitive Closure Problems --- p.60Chapter 8.1 --- Introduction --- p.60Chapter 8.2 --- Different ways of implementation of BRIN for transitive closure --- p.61Chapter 8.2.1 --- Digital Implementation --- p.61Chapter 8.2.2 --- Analog Implementation --- p.61Chapter 8.3 --- Transitive Closure Problem --- p.63Chapter 8.3.1 --- A special case of maximum spanning tree problem --- p.64Chapter 8.3.2 --- Analog approach solution for transitive closure problem --- p.65Chapter 8.3.3 --- Current-mode approach solution for transitive closure problem --- p.67Chapter 8.4 --- Comparisons between the different forms of implementation of BRIN for transitive closure --- p.71Chapter 8.4.1 --- Convergence Time --- p.71Chapter 8.4.2 --- Circuit complexity --- p.72Chapter 8.5 --- Discussion --- p.73Chapter 9 --- Critical path problems --- p.74Chapter 9.1 --- Introduction --- p.74Chapter 9.2 --- Problem statement and single-destination BRIN solution --- p.75Chapter 9.3 --- Analog implementation --- p.76Chapter 9.3.1 --- Separated building block --- p.78Chapter 9.3.2 --- Combined building block --- p.79Chapter 9.4 --- Current-mode approach --- p.80Chapter 9.4.1 --- "Site function, unit output function and a completed network" --- p.80Chapter 9.5 --- Conclusion --- p.83Chapter 10 --- Conclusions --- p.85Chapter 10.1 --- Summary of Achievements --- p.85Chapter 10.2 --- Future development --- p.88Chapter 10.2.1 --- Application for financial problems --- p.88Chapter 10.2.2 --- Fabrication of VLSI Implementation --- p.88Chapter 10.2.3 --- Actual prototyping of Analog Integrated Circuits for critical path and transitive closure problems --- p.89Chapter 10.2.4 --- Other implementation platform --- p.89Chapter 10.2.5 --- On-line update of routing table inside the router for network com- munication using BRIN --- p.89Chapter 10.2.6 --- Other BRIN's applications --- p.90Bibliography --- p.9

    Applications and implementation of neuro-connectionist architectures.

    Get PDF
    by H.S. Ng.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 91-97).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Introduction --- p.1Chapter 1.2 --- Neuro-connectionist Network --- p.2Chapter 2 --- Related Works --- p.5Chapter 2.1 --- Introduction --- p.5Chapter 2.1.1 --- Kruskal's Algorithm --- p.5Chapter 2.1.2 --- Prim's algorithm --- p.6Chapter 2.1.3 --- Sollin's algorithm --- p.7Chapter 2.1.4 --- Bellman-Ford algorithm --- p.8Chapter 2.1.5 --- Floyd-Warshall algorithm --- p.9Chapter 3 --- Binary Relation Inference Network and Path Problems --- p.11Chapter 3.1 --- Introduction --- p.11Chapter 3.2 --- Topology --- p.12Chapter 3.3 --- Network structure --- p.13Chapter 3.3.1 --- Single-destination BRIN architecture --- p.14Chapter 3.3.2 --- Comparison between all-pair BRIN and single-destination BRIN --- p.18Chapter 3.4 --- Path Problems and BRIN Solution --- p.18Chapter 3.4.1 --- Minimax path problems --- p.18Chapter 3.4.2 --- BRIN solution --- p.19Chapter 4 --- Analog and Voltage-mode Approach --- p.22Chapter 4.1 --- Introduction --- p.22Chapter 4.2 --- Analog implementation --- p.24Chapter 4.3 --- Voltage-mode approach --- p.26Chapter 4.3.1 --- The site function --- p.26Chapter 4.3.2 --- The unit function --- p.28Chapter 4.3.3 --- The computational unit --- p.28Chapter 4.4 --- Conclusion --- p.29Chapter 5 --- Current-mode Approach --- p.32Chapter 5.1 --- Introduction --- p.32Chapter 5.2 --- Current-mode approach for analog VLSI Implementation --- p.33Chapter 5.2.1 --- Site and Unit output function --- p.33Chapter 5.2.2 --- Computational unit --- p.34Chapter 5.2.3 --- A complete network --- p.35Chapter 5.3 --- Conclusion --- p.37Chapter 6 --- Neural Network Compensation for Optimization Circuit --- p.40Chapter 6.1 --- Introduction --- p.40Chapter 6.2 --- A Neuro-connectionist Architecture for error correction --- p.41Chapter 6.2.1 --- Linear Relationship --- p.42Chapter 6.2.2 --- Output Deviation of Computational Unit --- p.44Chapter 6.3 --- Experimental Results --- p.46Chapter 6.3.1 --- Training Phase --- p.46Chapter 6.3.2 --- Generalization Phase --- p.48Chapter 6.4 --- Conclusion --- p.50Chapter 7 --- Precision-limited Analog Neural Network Compensation --- p.51Chapter 7.1 --- Introduction --- p.51Chapter 7.2 --- Analog Neural Network hardware --- p.53Chapter 7.3 --- Integration of analog neural network compensation of connectionist net- work for general path problems --- p.54Chapter 7.4 --- Experimental Results --- p.55Chapter 7.4.1 --- Convergence time --- p.56Chapter 7.4.2 --- The accuracy of the system --- p.57Chapter 7.5 --- Conclusion --- p.58Chapter 8 --- Transitive Closure Problems --- p.60Chapter 8.1 --- Introduction --- p.60Chapter 8.2 --- Different ways of implementation of BRIN for transitive closure --- p.61Chapter 8.2.1 --- Digital Implementation --- p.61Chapter 8.2.2 --- Analog Implementation --- p.61Chapter 8.3 --- Transitive Closure Problem --- p.63Chapter 8.3.1 --- A special case of maximum spanning tree problem --- p.64Chapter 8.3.2 --- Analog approach solution for transitive closure problem --- p.65Chapter 8.3.3 --- Current-mode approach solution for transitive closure problem --- p.67Chapter 8.4 --- Comparisons between the different forms of implementation of BRIN for transitive closure --- p.71Chapter 8.4.1 --- Convergence Time --- p.71Chapter 8.4.2 --- Circuit complexity --- p.72Chapter 8.5 --- Discussion --- p.73Chapter 9 --- Critical path problems --- p.74Chapter 9.1 --- Introduction --- p.74Chapter 9.2 --- Problem statement and single-destination BRIN solution --- p.75Chapter 9.3 --- Analog implementation --- p.76Chapter 9.3.1 --- Separated building block --- p.78Chapter 9.3.2 --- Combined building block --- p.79Chapter 9.4 --- Current-mode approach --- p.80Chapter 9.4.1 --- "Site function, unit output function and a completed network" --- p.80Chapter 9.5 --- Conclusion --- p.83Chapter 10 --- Conclusions --- p.85Chapter 10.1 --- Summary of Achievements --- p.85Chapter 10.2 --- Future development --- p.88Chapter 10.2.1 --- Application for financial problems --- p.88Chapter 10.2.2 --- Fabrication of VLSI Implementation --- p.88Chapter 10.2.3 --- Actual prototyping of Analog Integrated Circuits for critical path and transitive closure problems --- p.89Chapter 10.2.4 --- Other implementation platform --- p.89Chapter 10.2.5 --- On-line update of routing table inside the router for network com- munication using BRIN --- p.89Chapter 10.2.6 --- Other BRIN's applications --- p.90Bibliography --- p.9

    Circuit paradigm in the 21

    Get PDF
    reviewe

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    On the synthesis and processing of high quality audio signals by parallel computers

    Get PDF
    This work concerns the application of new computer architectures to the creation and manipulation of high-quality audio bandwidth signals. The configuration of both the hardware and software in such systems falls under consideration in the three major sections which present increasing levels of algorithmic concurrency. In the first section, the programs which are described are distributed in identical copies across an array of processing elements; these programs run autonomously, generating data independently, but with control parameters peculiar to each copy: this type of concurrency is referred to as isonomic}The central section presents a structure which distributes tasks across an arbitrary network of processors; the flow of control in such a program is quasi- indeterminate, and controlled on a demand basis by the rate of completion of the slave tasks and their irregular interaction with the master. Whilst that interaction is, in principle, deterministic, it is also data-dependent; the dynamic nature of task allocation demands that no a priori knowledge of the rate of task completion be required. This type of concurrency is called dianomic? Finally, an architecture is described which will support a very high level of algorithmic concurrency. The programs which make efficient use of such a machine are designed not by considering flow of control, but by considering flow of data. Each atomic algorithmic unit is made as simple as possible, which results in the extensive distribution of a program over very many processing elements. Programs designed by considering only the optimum data exchange routes are said to exhibit systolic^ concurrency. Often neglected in the study of system design are those provisions necessary for practical implementations. It was intended to provide users with useful application programs in fulfilment of this study; the target group is electroacoustic composers, who use digital signal processing techniques in the context of musical composition. Some of the algorithms in use in this field are highly complex, often requiring a quantity of processing for each sample which exceeds that currently available even from very powerful computers. Consequently, applications tend to operate not in 'real-time' (where the output of a system responds to its input apparently instantaneously), but by the manipulation of sounds recorded digitally on a mass storage device. The first two sections adopt existing, public-domain software, and seek to increase its speed of execution significantly by parallel techniques, with the minimum compromise of functionality and ease of use. Those chosen are the general- purpose direct synthesis program CSOUND, from M.I.T., and a stand-alone phase vocoder system from the C.D.P..(^4) In each case, the desired aim is achieved: to increase speed of execution by two orders of magnitude over the systems currently in use by composers. This requires substantial restructuring of the programs, and careful consideration of the best computer architectures on which they are to run concurrently. The third section examines the rationale behind the use of computers in music, and begins with the implementation of a sophisticated electronic musical instrument capable of a degree of expression at least equal to its acoustic counterparts. It seems that the flexible control of such an instrument demands a greater computing resource than the sound synthesis part. A machine has been constructed with the intention of enabling the 'gestural capture' of performance information in real-time; the structure of this computer, which has one hundred and sixty high-performance microprocessors running in parallel, is expounded; and the systolic programming techniques required to take advantage of such an array are illustrated in the Occam programming language
    corecore