294 research outputs found
Parallel Viterbi Decoding Implementation by Multi-microprocessors
The Viterbi algorithm is a well-established technique for channel and source decoding in high performance digital communication systems. However, excessive time consumption makes it difficult to design an efficient highspeed decoder for practical application. The central unit of a Viterbi decoder is a data-dependent feedback loop which performs an add-compare-select (ACS) operation. This nonlinear recurrence is the bottleneck for a high-speed parallel implementation. This paper describes the implementation of parallel Viterbi algorithm by multi-microprocessors. Internal computations are performed in a parallel fashion. The use of microprocessors allows low-cost implementation with moderate complexity. An organization network, separate memory blocks and programs provide proper operation. For a fixed processing speed of given hardware parallel Viterbi decoding allows a linear speed up in the throughput rate by a linear increase in hardware complexity
Speech Recognition on an FPGA Using Discrete and Continuous Hidden Markov Models
Speech recognition is a computationally demanding task, particularly the stage which uses Viterbi decoding for converting pre-processed speech data into words or sub-word units. Any device that can reduce the load on, for example, a PC’s processor, is advantageous. Hence we present FPGA implementations of the decoder based alternately on discrete and continuous hidden Markov models (HMMs) representing monophones, and demonstrate that the discrete version can process speech nearly 5,000 times real time, using just 12% of the slices of a Xilinx Virtex XCV1000, but with a lower recognition rate than the continuous implementation, which is 75 times faster than real time, and occupies 45% of the same device
NASA Space Engineering Research Center for VLSI System Design
This annual report outlines the activities of the past year at the NASA SERC on VLSI Design. Highlights for this year include the following: a significant breakthrough was achieved in utilizing commercial IC foundries for producing flight electronics; the first two flight qualified chips were designed, fabricated, and tested and are now being delivered into NASA flight systems; and a new technology transfer mechanism has been established to transfer VLSI advances into NASA and commercial systems
Hardware simulation of Ku-band spacecraft receiver and bit synchronizer, volume 1
A hardware simulation which emulates an automatically acquiring transmit receive spread spectrum communication and tracking system and developed for use in future NASA programs involving digital communications is considered. The system architecture and tradeoff analysis that led to the selection of the system to be simulated is presented
CHANNEL CODING TECHNIQUES FOR A MULTIPLE TRACK DIGITAL MAGNETIC RECORDING SYSTEM
In magnetic recording greater area) bit packing densities are achieved through increasing
track density by reducing space between and width of the recording tracks, and/or
reducing the wavelength of the recorded information. This leads to the requirement of
higher precision tape transport mechanisms and dedicated coding circuitry.
A TMS320 10 digital signal processor is applied to a standard low-cost, low precision,
multiple-track, compact cassette tape recording system. Advanced signal processing and
coding techniques are employed to maximise recording density and to compensate for
the mechanical deficiencies of this system. Parallel software encoding/decoding
algorithms have been developed for several Run-Length Limited modulation codes. The
results for a peak detection system show that Bi-Phase L code can be reliably employed
up to a data rate of 5kbits/second/track. Development of a second system employing a
TMS32025 and sampling detection permitted the utilisation of adaptive equalisation to
slim the readback pulse. Application of conventional read equalisation techniques, that
oppose inter-symbol interference, resulted in a 30% increase in performance.
Further investigation shows that greater linear recording densities can be achieved by
employing Partial Response signalling and Maximum Likelihood Detection. Partial
response signalling schemes use controlled inter-symbol interference to increase
recording density at the expense of a multi-level read back waveform which results in an
increased noise penalty. Maximum Likelihood Sequence detection employs soft
decisions on the readback waveform to recover this loss. The associated modulation
coding techniques required for optimised operation of such a system are discussed.
Two-dimensional run-length-limited (d, ky) modulation codes provide a further means of
increasing storage capacity in multi-track recording systems. For example the code rate
of a single track run length-limited code with constraints (1, 3), such as Miller code, can
be increased by over 25% when using a 4-track two-dimensional code with the same d
constraint and with the k constraint satisfied across a number of parallel channels. The k
constraint along an individual track, kx, can be increased without loss of clock
synchronisation since the clocking information derived by frequent signal transitions
can be sub-divided across a number of, y, parallel tracks in terms of a ky constraint. This
permits more code words to be generated for a given (d, k) constraint in two dimensions
than is possible in one dimension. This coding technique is furthered by development of
a reverse enumeration scheme based on the trellis description of the (d, ky) constraints.
The application of a two-dimensional code to a high linear density system employing
extended class IV partial response signalling and maximum likelihood detection is
proposed. Finally, additional coding constraints to improve spectral response and error
performance are discussed.Hewlett Packard, Computer Peripherals Division (Bristol
Design of Special Function Units in Modern Microprocessors
Today’s computing systems demand high performance for applications such as cloud computing, web-based search engines, network applications, and social media tasks. Such software applications involve an extensive use of hashing and arithmetic operations in their computation. In this thesis, we explore the use of new special function units (SFUs) for modern microprocessors, to accelerate such workloads. First, we design an SFU for hashing. Hashing can reduce the complexity of search and lookup from O(p) to O(p/n), where n bins are used and p items are being processed. In modern microprocessors, hashing is done in software. In our work, we propose a novel hardware hash unit design for use in modern microprocessors. Since the hash unit is designed at the hardware level, several advantages are obtained by our approach. First, a hardware-based hash unit executes a single hash instruction to perform a hash operation. In a software-based hashing in modern microprocessors, a hash operation is compiled into multiple instructions, thereby degrading performance. Second, software-based hashing stores hash data in a DRAM (also, hash operation entries can be stored in one of the cache levels). In a hardware-based hash unit, hash data is stored in a dedicated memory module (a hardware hash table), which improves performance. Third, today’s operating systems execute multiple applications (processes) in parallel, which entail high memory utilization. Hence the operating systems require many context switching between different processes, which results in many cache misses. In a hardware-based hash unit, the cache misses is reduced significantly using the dedicated memory module (hash table). These advantages all reduce the power consumption and increase the overall system performance significantly with a minimal increase in the microprocessor’s die area. We evaluate our hardware-based hash unit and compare its performance with software-based hashing. We start by evaluating our design approach at the micro-architecture level in terms of system performance. After that, we design our approach at the circuit level design to obtain the area overhead. Also, we analyze our design’s power and delay for each hash operation. These results are compared with a traditional hashing implementation.
Then, we present an FPGA-based coprocessor for hash unit acceleration, applied to a virus checking application. Second, we present an SFU to speed up arithmetic operations. We call this arithmetic SFU a programmable arithmetic unit (PAU). In modern microprocessors, applications that require heavy arithmetic computations are done in software. To improve the performance for such computations, we present a programmable arithmetic unit (PAU), a partially reconfigurable methodology for arithmetic applications. The PAU consists of a set of IP blocks connected to a reconfigurable FPGA controller via a fast mesh-based interconnect. The IP blocks in the PAU can be any IP block such as adders, subtractors, multipliers, comparators and sign extension units. The PAU can have one or more copies of the same IP block (for example, 5 adders and 7 multipliers). The FPGA controller is an on-chip FPGA-based reconfigurable control fabric. The FPGA controller enables different arithmetic applications to be embedded on the PAU. The FPGA controller is programmed for different applications. The reconfigurable logic is based on a LUT-based design like a traditional FPGA. The FPGA controller and the IP blocks in the PAU communicate via a high speed ring data fabric. In our work, we use the PAU as an SFU in modern microprocessors. We compare the performance of different hardware-based arithmetic applications in the PAU with software-based implementations in modern microprocessors
VLSI Architectures and Rapid Prototyping Testbeds for Wireless Systems
The rapid evolution of wireless access is creating an ever changing variety of standards for indoor and outdoor environments. The real-time processing demands of wireless data rates in excess of 100 Mbps is a challenging problem for
architecture design and verification. In this paper, we consider current trends in VLSI architecture and in rapid prototyping testbeds to evaluate these systems. The key phases in multi-standard system design and prototyping
include: Algorithm Mapping to Parallel Architectures – based on the real-time data and sampling rate and the resulting area, time and power complexity; Configurable Mappings and Design Exploration – based on heterogeneous architectures consisting of DSP, programmable application-specific instruction (ASIP) processors, and co-processors; and Verification and Testbed Integration
– based on prototype implementation on programmable devices and integration with RF units.Nokia Foundation FellowshipNokia CorporationNational InstrumentsNational Science Foundatio
- …