294 research outputs found

    Parallel Viterbi Decoding Implementation by Multi-microprocessors

    Get PDF
    The Viterbi algorithm is a well-established technique for channel and source decoding in high performance digital communication systems. However, excessive time consumption makes it difficult to design an efficient highspeed decoder for practical application. The central unit of a Viterbi decoder is a data-dependent feedback loop which performs an add-compare-select (ACS) operation. This nonlinear recurrence is the bottleneck for a high-speed parallel implementation. This paper describes the implementation of parallel Viterbi algorithm by multi-microprocessors. Internal computations are performed in a parallel fashion. The use of microprocessors allows low-cost implementation with moderate complexity. An organization network, separate memory blocks and programs provide proper operation. For a fixed processing speed of given hardware parallel Viterbi decoding allows a linear speed up in the throughput rate by a linear increase in hardware complexity

    Speech Recognition on an FPGA Using Discrete and Continuous Hidden Markov Models

    Get PDF
    Speech recognition is a computationally demanding task, particularly the stage which uses Viterbi decoding for converting pre-processed speech data into words or sub-word units. Any device that can reduce the load on, for example, a PC’s processor, is advantageous. Hence we present FPGA implementations of the decoder based alternately on discrete and continuous hidden Markov models (HMMs) representing monophones, and demonstrate that the discrete version can process speech nearly 5,000 times real time, using just 12% of the slices of a Xilinx Virtex XCV1000, but with a lower recognition rate than the continuous implementation, which is 75 times faster than real time, and occupies 45% of the same device

    Domain specific high performance reconfigurable architecture for a communication platform

    Get PDF

    NASA Space Engineering Research Center for VLSI System Design

    Get PDF
    This annual report outlines the activities of the past year at the NASA SERC on VLSI Design. Highlights for this year include the following: a significant breakthrough was achieved in utilizing commercial IC foundries for producing flight electronics; the first two flight qualified chips were designed, fabricated, and tested and are now being delivered into NASA flight systems; and a new technology transfer mechanism has been established to transfer VLSI advances into NASA and commercial systems

    Hardware simulation of Ku-band spacecraft receiver and bit synchronizer, volume 1

    Get PDF
    A hardware simulation which emulates an automatically acquiring transmit receive spread spectrum communication and tracking system and developed for use in future NASA programs involving digital communications is considered. The system architecture and tradeoff analysis that led to the selection of the system to be simulated is presented

    CHANNEL CODING TECHNIQUES FOR A MULTIPLE TRACK DIGITAL MAGNETIC RECORDING SYSTEM

    Get PDF
    In magnetic recording greater area) bit packing densities are achieved through increasing track density by reducing space between and width of the recording tracks, and/or reducing the wavelength of the recorded information. This leads to the requirement of higher precision tape transport mechanisms and dedicated coding circuitry. A TMS320 10 digital signal processor is applied to a standard low-cost, low precision, multiple-track, compact cassette tape recording system. Advanced signal processing and coding techniques are employed to maximise recording density and to compensate for the mechanical deficiencies of this system. Parallel software encoding/decoding algorithms have been developed for several Run-Length Limited modulation codes. The results for a peak detection system show that Bi-Phase L code can be reliably employed up to a data rate of 5kbits/second/track. Development of a second system employing a TMS32025 and sampling detection permitted the utilisation of adaptive equalisation to slim the readback pulse. Application of conventional read equalisation techniques, that oppose inter-symbol interference, resulted in a 30% increase in performance. Further investigation shows that greater linear recording densities can be achieved by employing Partial Response signalling and Maximum Likelihood Detection. Partial response signalling schemes use controlled inter-symbol interference to increase recording density at the expense of a multi-level read back waveform which results in an increased noise penalty. Maximum Likelihood Sequence detection employs soft decisions on the readback waveform to recover this loss. The associated modulation coding techniques required for optimised operation of such a system are discussed. Two-dimensional run-length-limited (d, ky) modulation codes provide a further means of increasing storage capacity in multi-track recording systems. For example the code rate of a single track run length-limited code with constraints (1, 3), such as Miller code, can be increased by over 25% when using a 4-track two-dimensional code with the same d constraint and with the k constraint satisfied across a number of parallel channels. The k constraint along an individual track, kx, can be increased without loss of clock synchronisation since the clocking information derived by frequent signal transitions can be sub-divided across a number of, y, parallel tracks in terms of a ky constraint. This permits more code words to be generated for a given (d, k) constraint in two dimensions than is possible in one dimension. This coding technique is furthered by development of a reverse enumeration scheme based on the trellis description of the (d, ky) constraints. The application of a two-dimensional code to a high linear density system employing extended class IV partial response signalling and maximum likelihood detection is proposed. Finally, additional coding constraints to improve spectral response and error performance are discussed.Hewlett Packard, Computer Peripherals Division (Bristol

    Design of Special Function Units in Modern Microprocessors

    Get PDF
    Today’s computing systems demand high performance for applications such as cloud computing, web-based search engines, network applications, and social media tasks. Such software applications involve an extensive use of hashing and arithmetic operations in their computation. In this thesis, we explore the use of new special function units (SFUs) for modern microprocessors, to accelerate such workloads. First, we design an SFU for hashing. Hashing can reduce the complexity of search and lookup from O(p) to O(p/n), where n bins are used and p items are being processed. In modern microprocessors, hashing is done in software. In our work, we propose a novel hardware hash unit design for use in modern microprocessors. Since the hash unit is designed at the hardware level, several advantages are obtained by our approach. First, a hardware-based hash unit executes a single hash instruction to perform a hash operation. In a software-based hashing in modern microprocessors, a hash operation is compiled into multiple instructions, thereby degrading performance. Second, software-based hashing stores hash data in a DRAM (also, hash operation entries can be stored in one of the cache levels). In a hardware-based hash unit, hash data is stored in a dedicated memory module (a hardware hash table), which improves performance. Third, today’s operating systems execute multiple applications (processes) in parallel, which entail high memory utilization. Hence the operating systems require many context switching between different processes, which results in many cache misses. In a hardware-based hash unit, the cache misses is reduced significantly using the dedicated memory module (hash table). These advantages all reduce the power consumption and increase the overall system performance significantly with a minimal increase in the microprocessor’s die area. We evaluate our hardware-based hash unit and compare its performance with software-based hashing. We start by evaluating our design approach at the micro-architecture level in terms of system performance. After that, we design our approach at the circuit level design to obtain the area overhead. Also, we analyze our design’s power and delay for each hash operation. These results are compared with a traditional hashing implementation. Then, we present an FPGA-based coprocessor for hash unit acceleration, applied to a virus checking application. Second, we present an SFU to speed up arithmetic operations. We call this arithmetic SFU a programmable arithmetic unit (PAU). In modern microprocessors, applications that require heavy arithmetic computations are done in software. To improve the performance for such computations, we present a programmable arithmetic unit (PAU), a partially reconfigurable methodology for arithmetic applications. The PAU consists of a set of IP blocks connected to a reconfigurable FPGA controller via a fast mesh-based interconnect. The IP blocks in the PAU can be any IP block such as adders, subtractors, multipliers, comparators and sign extension units. The PAU can have one or more copies of the same IP block (for example, 5 adders and 7 multipliers). The FPGA controller is an on-chip FPGA-based reconfigurable control fabric. The FPGA controller enables different arithmetic applications to be embedded on the PAU. The FPGA controller is programmed for different applications. The reconfigurable logic is based on a LUT-based design like a traditional FPGA. The FPGA controller and the IP blocks in the PAU communicate via a high speed ring data fabric. In our work, we use the PAU as an SFU in modern microprocessors. We compare the performance of different hardware-based arithmetic applications in the PAU with software-based implementations in modern microprocessors

    VLSI Architectures and Rapid Prototyping Testbeds for Wireless Systems

    Get PDF
    The rapid evolution of wireless access is creating an ever changing variety of standards for indoor and outdoor environments. The real-time processing demands of wireless data rates in excess of 100 Mbps is a challenging problem for architecture design and verification. In this paper, we consider current trends in VLSI architecture and in rapid prototyping testbeds to evaluate these systems. The key phases in multi-standard system design and prototyping include: Algorithm Mapping to Parallel Architectures – based on the real-time data and sampling rate and the resulting area, time and power complexity; Configurable Mappings and Design Exploration – based on heterogeneous architectures consisting of DSP, programmable application-specific instruction (ASIP) processors, and co-processors; and Verification and Testbed Integration – based on prototype implementation on programmable devices and integration with RF units.Nokia Foundation FellowshipNokia CorporationNational InstrumentsNational Science Foundatio
    corecore