161 research outputs found

    VLSI architectures for high speed Fourier transform processing

    Get PDF

    FPGA Frequency Domain Based Gps Coarse Acquisition Processor using FFT

    Get PDF
    The Global Positioning System or GPS is a satellite based technology that has gained widespread use worldwide in civilian and military applications. Direct Sequence Spread spectrum (DSSS) is the method whereby the data transmitted by the satellite and received by user is kept secure, low power and relatively noise-immune. The first step required in the GPS operation is to perform a lock on the incoming signal, both with respect to time synchronization and frequency resolution. Because of the need for reduced time to lock and also reduced hardware, algorithms based in the frequency domain have been developed. These algorithms take advantage of the time to frequency matrix operation known as the fast Fourier transform or FFT. For this thesis, a Direct Sequence Spread Spectrum Coarse Acquisition code processor based on the FFT was implemented in VHDL and targeted to a Xilinx Virtex –II Pro Field Programmable Gate Array (FPGA). The use of the FFT allows simultaneous lock on coarse acquisition (C/A) code and carrier frequency. Because of hardware limitations, a novel technique of sub-sampling is used in this system to obtain data block sizes that match hardware limitations. In addition, design challenges related to scheduling and timing were addressed, allowing a system with 19 pipeline stages to be built. The system, which fits on a Xilinx Virtex-II pro XC2VP70 FPGA, uses 10 ms of data to perform the lock with 5.5 ms of processing time at 100 MHz and theoretically can operate on signals 20 db below the noise floor

    Designing Flexible, Energy Efficient and Secure Wireless Solutions for the Internet of Things

    Full text link
    The Internet of Things (IoT) is an emerging concept where ubiquitous physical objects (things) consisting of sensor, transceiver, processing hardware and software are interconnected via the Internet. The information collected by individual IoT nodes is shared among other often heterogeneous devices and over the Internet. This dissertation presents flexible, energy efficient and secure wireless solutions in the IoT application domain. System design and architecture designs are discussed envisioning a near-future world where wireless communication among heterogeneous IoT devices are seamlessly enabled. Firstly, an energy-autonomous wireless communication system for ultra-small, ultra-low power IoT platforms is presented. To achieve orders of magnitude energy efficiency improvement, a comprehensive system-level framework that jointly optimizes various system parameters is developed. A new synchronization protocol and modulation schemes are specified for energy-scarce ultra-small IoT nodes. The dynamic link adaptation is proposed to guarantee the ultra-small node to always operate in the most energy efficiency mode, given an operating scenario. The outcome is a truly energy-optimized wireless communication system to enable various new applications such as implanted smart-dust devices. Secondly, a configurable Software Defined Radio (SDR) baseband processor is designed and shown to be an efficient platform on which to execute several IoT wireless standards. It is a custom SIMD execution model coupled with a scalar unit and several architectural optimizations: streaming registers, variable bitwidth, dedicated ALUs, and an optimized reduction network. Voltage scaling and clock gating are employed to further reduce the power, with a more than a 100% time margin reserved for reliable operation in the near-threshold region. Two upper bound systems are evaluated. A comprehensive power/area estimation indicates that the overhead of realizing SDR flexibility is insignificant. The benefit of baseband SDR is quantified and evaluated. To further augment the benefits of a flexible baseband solution and to address the security issue of IoT connectivity, a light-weight Galois Field (GF) processor is proposed. This processor enables both energy-efficient block coding and symmetric/asymmetric cryptography kernel processing for a wide range of GF sizes (2^m, m = 2, 3, ..., 233) and arbitrary irreducible polynomials. Program directed connections among primitive GF arithmetic units enable dynamically configured parallelism to efficiently perform either four-way SIMD GF operations, including multiplicative inverse, or a long bit-width GF product in a single cycle. This demonstrates the feasibility of a unified architecture to enable error correction coding flexibility and secure wireless communication in the low power IoT domain.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137164/1/yajchen_1.pd

    Hardware acceleration of the trace transform for vision applications

    Get PDF
    Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration

    Application-specific instruction set processor for speech recognition.

    Get PDF
    Cheung Man Ting.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 69-71).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- The Emergence of ASIP --- p.1Chapter 1.1.1 --- Related Work --- p.3Chapter 1.2 --- Motivation --- p.6Chapter 1.3 --- ASIP Design Methodologies --- p.7Chapter 1.4 --- Fundamentals of Speech Recognition --- p.8Chapter 1.5 --- Thesis outline --- p.10Chapter 2 --- Automatic Speech Recognition --- p.11Chapter 2.1 --- Overview of ASR system --- p.11Chapter 2.2 --- Theory of Front-end Feature Extraction --- p.12Chapter 2.3 --- Theory of HMM-based Speech Recognition --- p.14Chapter 2.3.1 --- Hidden Markov Model (HMM) --- p.14Chapter 2.3.2 --- The Typical Structure of the HMM --- p.14Chapter 2.3.3 --- Discrete HMMs and Continuous HMMs --- p.15Chapter 2.3.4 --- The Three Basic Problems for HMMs --- p.17Chapter 2.3.5 --- Probability Evaluation --- p.18Chapter 2.4 --- The Viterbi Search Engine --- p.19Chapter 2.5 --- Isolated Word Recognition (IWR) --- p.22Chapter 3 --- Design of ASIP Platform --- p.24Chapter 3.1 --- Instruction Fetch --- p.25Chapter 3.2 --- Instruction Decode --- p.26Chapter 3.3 --- Datapath --- p.29Chapter 3.4 --- Register File Systems --- p.30Chapter 3.4.1 --- Memory Hierarchy --- p.30Chapter 3.4.2 --- Register File Organization --- p.31Chapter 3.4.3 --- Special Registers --- p.34Chapter 3.4.4 --- Address Generation --- p.34Chapter 3.4.5 --- Load and Store --- p.36Chapter 4 --- Implementation of Speech Recognition on ASIP --- p.37Chapter 4.1 --- Hardware Architecture Exploration --- p.37Chapter 4.1.1 --- Floating Point and Fixed Point --- p.37Chapter 4.1.2 --- Multiplication and Accumulation --- p.38Chapter 4.1.3 --- Pipelining --- p.41Chapter 4.1.4 --- Memory Architecture --- p.43Chapter 4.1.5 --- Saturation Logic --- p.44Chapter 4.1.6 --- Specialized Addressing Modes --- p.44Chapter 4.1.7 --- Repetitive Operation --- p.47Chapter 4.2 --- Software Algorithm Implementation --- p.49Chapter 4.2.1 --- Implementation Using Base Instruction Set --- p.49Chapter 4.2.2 --- Implementation Using Refined Instruction Set --- p.54Chapter 5 --- Simulation Results --- p.56Chapter 6 --- Conclusions and Future Work --- p.60Appendices --- p.62Chapter A --- Base Instruction Set --- p.62Chapter B --- Special Registers --- p.65Chapter C --- Chip Microphotograph of ASIP --- p.67Chapter D --- The Testing Board of ASIP --- p.68Bibliography --- p.6

    Domain-specific and reconfigurable instruction cells based architectures for low-power SoC

    Get PDF

    Simplified vector-thread architectures for flexible and efficient data-parallel accelerators

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 165-170).This thesis explores a new approach to building data-parallel accelerators that is based on simplifying the instruction set, microarchitecture, and programming methodology for a vector-thread architecture. The thesis begins by categorizing regular and irregular data-level parallelism (DLP), before presenting several architectural design patterns for data-parallel accelerators including the multiple-instruction multiple-data (MIMD) pattern, the vector single-instruction multiple-data (vector-SIMD) pattern, the single-instruction multiple-thread (SIMT) pattern, and the vector-thread (VT) pattern. Our recently proposed VT pattern includes many control threads that each manage their own array of microthreads. The control thread uses vector memory instructions to efficiently move data and vector fetch instructions to broadcast scalar instructions to all microthreads. These vector mechanisms are complemented by the ability for each microthread to direct its own control flow. In this thesis, I introduce various techniques for building simplified instances of the VT pattern. I propose unifying the VT control-thread and microthread scalar instruction sets to simplify the microarchitecture and programming methodology. I propose a new single-lane VT microarchitecture based on minimal changes to the vector-SIMD pattern.(cont.) Single-lane cores are simpler to implement than multi-lane cores and can achieve similar energy efficiency. This new microarchitecture uses control processor embedding to mitigate the area overhead of single-lane cores, and uses vector fragments to more efficiently handle both regular and irregular DLP as compared to previous VT architectures. I also propose an explicitly data-parallel VT programming methodology that is based on a slightly modified scalar compiler. This methodology is easier to use than assembly programming, yet simpler to implement than an automatically vectorizing compiler. To evaluate these ideas, we have begun implementing the Maven data-parallel accelerator. This thesis compares a simplified Maven VT core to MIMD, vector-SIMD, and SIMT cores. We have implemented these cores with an ASIC methodology, and I use the resulting gate-level models to evaluate the area, performance, and energy of several compiled microbenchmarks. This work is the first detailed quantitative comparison of the VT pattern to other patterns. My results suggest that future data-parallel accelerators based on simplified VT architectures should be able to combine the energy efficiency of vector-SIMD accelerators with the flexibility of MIMD accelerators.by Christopher Francis Batten.Ph.D

    FPGA Frequency Domain Based GPS Coarse Acquisition Processor Using FFT

    Get PDF

    Vector Disparity Sensor with Vergence Control for Active Vision Systems

    Get PDF
    This paper presents an architecture for computing vector disparity for active vision systems as used on robotics applications. The control of the vergence angle of a binocular system allows us to efficiently explore dynamic environments, but requires a generalization of the disparity computation with respect to a static camera setup, where the disparity is strictly 1-D after the image rectification. The interaction between vision and motor control allows us to develop an active sensor that achieves high accuracy of the disparity computation around the fixation point, and fast reaction time for the vergence control. In this contribution, we address the development of a real-time architecture for vector disparity computation using an FPGA device. We implement the disparity unit and the control module for vergence, version, and tilt to determine the fixation point. In addition, two on-chip different alternatives for the vector disparity engines are discussed based on the luminance (gradient-based) and phase information of the binocular images. The multiscale versions of these engines are able to estimate the vector disparity up to 32 fps on VGA resolution images with very good accuracy as shown using benchmark sequences with known ground-truth. The performances in terms of frame-rate, resource utilization, and accuracy of the presented approaches are discussed. On the basis of these results, our study indicates that the gradient-based approach leads to the best trade-off choice for the integration with the active vision system
    • 

    corecore