375 research outputs found
Hardware Implementation of Iterative Projection-Aggregation Decoding of Reed-Muller Codes
In this work, we present a simplification and a corresponding hardware
architecture for hard-decision recursive projection-aggregation (RPA) decoding
of Reed-Muller (RM) codes. In particular, we transform the recursive structure
of RPA decoding into a simpler and iterative structure with minimal
error-correction degradation. Our simulation results for RM(7,3) show that the
proposed simplification has a small error-correcting performance degradation
(0.005 in terms of channel crossover probability) while reducing the average
number of computations by up to 40%. In addition, we describe the first fully
parallel hardware architecture for simplified RPA decoding. We present FPGA
implementation results for an RM(6,3) code on a Xilinx Virtex-7 FPGA showing
that our proposed architecture achieves a throughput of 171 Mbps at a frequency
of 80 MHz
Parallel algorithms and architectures for low power video decoding
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 197-204).Parallelism coupled with voltage scaling is an effective approach to achieve high processing performance with low power consumption. This thesis presents parallel architectures and algorithms designed to deliver the power and performance required for current and next generation video coding. Coding efficiency, area cost and scalability are also addressed. First, a low power video decoder is presented for the current state-of-the-art video coding standard H.264/AVC. Parallel architectures are used along with voltage scaling to deliver high definition (HD) decoding at low power levels. Additional architectural optimizations such as reducing memory accesses and multiple frequency/voltage domains are also described. An H.264/AVC Baseline decoder test chip was fabricated in 65-nm CMOS. It can operate at 0.7 V for HD (720p, 30 fps) video decoding and with a measured power of 1.8 mW. The highly scalable decoder can tradeoff power and performance across >100x range. Second, this thesis demonstrates how serial algorithms, such as Context-based Adaptive Binary Arithmetic Coding (CABAC), can be redesigned for parallel architectures to enable high throughput with low coding efficiency cost. A parallel algorithm called the Massively Parallel CABAC (MP-CABAC) is presented that uses syntax element partitions and interleaved entropy slices to achieve better throughput-coding efficiency and throughput-area tradeoffs than H.264/AVC. The parallel algorithm also improves scalability by providing a third dimension to tradeoff coding efficiency for power and performance. Finally, joint algorithm-architecture optimizations are used to increase performance and reduce area with almost no coding penalty. The MP-CABAC is mapped to a highly parallel architecture with 80 parallel engines, which together delivers >10x higher throughput than existing H.264/AVC CABAC implementations. A MP-CABAC test chip was fabricated in 65-nm CMOS to demonstrate the power-performance-coding efficiency tradeoff.by Vivienne. Sze.Ph.D
Hardware acceleration of the trace transform for vision applications
Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration
FPGA-based architectures for next generation communications networks
This engineering doctorate concerns the application of Field Programmable Gate Array (FPGA) technology to some of the challenges faced in the design of next generation communications networks. The growth and convergence of such networks has fuelled demand for higher bandwidth systems, and a requirement to support a diverse range of payloads across the network span.
The research which follows focuses on the development of FPGA-based architectures for two important paradigms in contemporary networking - Forward Error Correction and Packet Classification. The work seeks to combine analysis of the underlying algorithms and mathematical techniques which drive these applications, with an informed approach to the design of efficient FPGA-based circuits
Predictable multi-processor system on chip design for multimedia applications
The design of multimedia systems has become increasingly complex due to consumer requirements. Consumers demand the functionalities offered by a huge desktop from these systems. Many of these systems are mobile. Therefore, power consumption and size of these devices should be small. These systems are increasingly becoming multi-processor based (MPSoCs) for the reasons of power and performance. Applications execute on these systems in different combinations also known as use-cases. Applications may have different performance requirements in each use-case. Currently, verification of all these use-cases takes bulk of the design effort. There is a need for analysis based techniques so that the platforms have a predictable behaviour and in turn provide guarantees on performance without expending precious man hours on verification. In this dissertation, techniques and architectures have been developed to design and manage these multi-processor based systems efficiently. The dissertation presents predictable architectural components for MPSoCs, a Predictable MPSoC design strategy, automatic platform synthesis tool, a run-time system and an MPSoC simulation technique. The introduction of predictability helps in rapid design of MPSoC platforms. Chapter 1 of the thesis studies the trends in modern multimedia applications and processor architectures. The chapter further highlights the problems in the design of MPSoC platforms and emphasizes the need of predictable design techniques. Predictable design techniques require predictable application and architectural components. The chapter further elaborates on Synchronous Data Flow Graphs which are used to model the applications throughout this thesis. The chapter presents the architecture template used in this thesis and enlists the contributions of the thesis. One of the contributions of this thesis is the design of a predictable component called communication assist. Chapter 2 of the thesis describes the architecture of this communication assist. The communication assist presented in this thesis not only decouples the communication from computation but also provides timing guarantees. Based on this communication assist, an MPSoC platform generation technique has been presented that can design MPSoC platforms capable of satisfying the throughput constraints of multiple applications in all use-cases. The technique is presented in Chapter 3. The design strategy uses three simple steps for platform design. In the first step it finds the required number of processors. The second step minimizes the communication interconnect between the processors and the third step minimizes the communication memory requirement of the platform. Further in Chapter 4, a tool has been developed to generate CA-based platforms for FPGAs. The output of this tool can be used to synthesize platforms on real hardware with the help of FPGA synthesis tools. The applications executing on these platforms often exhibit dynamism e.g. variation in task execution times and change in application throughput requirements. Further, new applications may often be added by consumers at run-time. Resource managers have been presented in literature to handle such dynamic situations. However, the scalability of these resource managers becomes an issue with the increase in number of processors and applications. Chapter 5 presents distributed run-time resource management techniques. Two versions of distributed resource managers have been presented which are scalable with the number of applications and processors. MPSoC platforms for real-time applications are designed assuming worst-case task execution times. It is known that the difference between average-case and worst-case behaviour can be quite large. Therefore, knowing the average case performance is also important for the system designer, and software simulation is often employed to estimate this. However, simulation in software is slow and does not scale with the number of applications and processing elements. In Chapter 6, a fast and scalable simulation methodology is introduced that can simulate the execution of multiple applications on an MPSoC platform. It is based on parallel execution of SDF (Synchronous Data Flow) models of applications. The simulation methodology uses Parallel Discrete Event Simulation (PDES) primitives and it is termed as "Smart Conservative PDES". The methodology generates a parallel simulator which is synthesizable on FPGAs. The framework can also be used to model dynamic arbitration policies which are difficult to analyse using models. The generated platform is also useful in carrying out Design Space Exploration as shown in the thesis. Finally, Chapter 7 summarizes the main findings and (practical) implications of the studies described in previous chapters of this dissertation. Using the contributions mentioned in the thesis, a designer can design and implement predictable multiprocessor based systems capable of satisfying throughput constraints of multiple applications in given set of use-cases, and employ resource management strategies to deal with dynamism in the applications. The chapter also describes the main limitations of this dissertation and makes suggestions for future research
Glucose-powered neuroelectronics
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 157-164).A holy grail of bioelectronics is to engineer biologically implantable systems that can be embedded without disturbing their local environments, while harvesting from their surroundings all of the power they require. As implantable electronic devices become increasingly prevalent in scientific research and in the diagnosis, management, and treatment of human disease, there is correspondingly increasing demand for devices with unlimited functional lifetimes that integrate seamlessly with their hosts in these two ways. This thesis presents significant progress toward establishing the feasibility of one such system: A brain-machine interface powered by a bioimplantable fuel cell that harvests energy from extracellular glucose in the cerebrospinal fluid surrounding the brain. The first part of this thesis describes a set of biomimetic algorithms and low-power circuit architectures for decoding electrical signals from ensembles of neurons in the brain. The decoders are intended for use in the context of neural rehabilitation, to provide paralyzed or otherwise disabled patients with instantaneous, natural, thought-based control of robotic prosthetic limbs and other external devices. This thesis presents a detailed discussion of the decoding algorithms, descriptions of the low-power analog and digital circuit architectures used to implement the decoders, and results validating their performance when applied to decode real neural data. A major constraint on brain-implanted electronic devices is the requirement that they consume and dissipate very little power, so as not to damage surrounding brain tissue. The systems described here address that constraint, computing in the style of biological neural networks, and using arithmetic-free, purely logical primitives to establish universal computing architectures for neural decoding. The second part of this thesis describes the development of an implantable fuel cell powered by extracellular glucose at concentrations such as those found in the cerebrospinal fluid surrounding the brain. The theoretical foundations, details of design and fabrication, mechanical and electrochemical characterization, as well as in vitro performance data for the fuel cell are presented.by Benjamin Isaac Rapoport.Ph.D
Recommended from our members
A SIMD architecture for hard real-time systems
Emerging safety-critical systems require high-performance data-parallel architectures and, problematically, ones that can guarantee tight and safe worst-case execution times. Given the complexity of existing architectures like GPUs, it is unlikely that sufficiently accurate models and algorithms for timing analysis will emerge in the foreseeable future. This motivates a clean-slate approach to designing a real-time data-parallel architecture.
In this work I present Sim-D: a wide-SIMD architecture for hard real-time systems. Similar to GPUs, Sim-D performs hardware strip-mining to schedule the work for a compute kernel in entities called work-groups. Sim-D schedules the work for each work-group as a sequence of uninterruptible access- and execute program phases, interleaving the phases of two work-groups. By providing performance isolation between the memory- and compute resources, the execution time of each phase can be tightly bound through static analysis.
I present a predictable closed-page DRAM controller that processes requests for large 1D- and 2D blocks of data, as well as indirect indexed transfers. These large transfers coalesce the data requests of a whole work-group. For a linear 4KiB transfer over a 64-bit data bus, the utilisation provably exceeds 78% for DDR4-3200AA DRAM. For 2D blocks, a well-chosen tiling configuration can achieve near-similar efficiency. I show that bounds on the execution time of indexed transfers are pessimistic by nature, but propose a novel snoopy indexed transfer mechanism that permits more reasonable bounds when the buffer size is limited.
Finally, I present a worst-case execution time calculation algorithm for Sim-D. This algorithm is paired with two hardware work-group scheduling policies that deterministically reduce run-time variance. The worst-case execution time analysis algorithm combines static control flow analysis with a simulation-based cost model for execution and DRAM transfers. Its key novelty is the addition of a stage that considers work-group scheduling effects. I show that the work-group scheduling policies degrade performance on average by 8.9%, but permit the calculation of worst-case execution time bounds that are tight within 14.3% on average for benchmarks that avoid inefficient indexed transfers
Primitives and design of the intelligent pixel multimedia communicator
Communication systems arc an ever more essential component of our modern global society. Mobile communications systems are still in a state of rapid advancement and growth. Technology is constantly evolving at a rapid pace in ever more diverse areas and the emerging mobile multimedia based communication systems offer new challenges for both current and future technologies. To realise the full potential of mobile multimedia communication systems there is a need to explore new options to solve some of the fundamental problems facing the technology. In particular, the complexity of such a system within an infrastructure framework that is inherently limited by its power sources and has very restricted transmission bandwidth demands new methodologies and approaches
The 1991 3rd NASA Symposium on VLSI Design
Papers from the symposium are presented from the following sessions: (1) featured presentations 1; (2) very large scale integration (VLSI) circuit design; (3) VLSI architecture 1; (4) featured presentations 2; (5) neural networks; (6) VLSI architectures 2; (7) featured presentations 3; (8) verification 1; (9) analog design; (10) verification 2; (11) design innovations 1; (12) asynchronous design; and (13) design innovations 2
- …