106 research outputs found
Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)
ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability
H-SIMD machine : configurable parallel computing for data-intensive applications
This dissertation presents a hierarchical single-instruction multiple-data (H-SLMD) configurable computing architecture to facilitate the efficient execution of data-intensive applications on field-programmable gate arrays (FPGAs). H-SIMD targets data-intensive applications for FPGA-based system designs. The H-SIMD machine is associated with a hierarchical instruction set architecture (HISA) which is developed for each application. The main objectives of this work are to facilitate ease of program development and high performance through ease of scheduling operations and overlapping communications with computations.
The H-SIMD machine is composed of the host, FPGA and nano-processor layers. They execute host SIMD instructions (HSIs), FPGA SIMD instructions (FSIs) and nano-processor instructions (NPLs), respectively. A distinction between communication and computation instructions is intended for all the HISA layers. The H-SIMD machine also employs a memory switching scheme to bridge the omnipresent large bandwidth gaps in configurable systems. To showcase the proposed high-performance approach, the conditions to fully overlap communications with computations are investigated for important applications. The building blocks in the H-SLMD machine, such as high-performance and area-efficient register files, are presented in detail. The H-SLMD machine hierarchy is implemented on a host Dell workstation and the Annapolis Wildstar II FPGA board. Significant speedups have been achieved for matrix multiplication (MM), 2-dimensional discrete cosine transform (2D DCT) and 2-dimensional fast Fourier transform (2D FFT) which are used widely in science and engineering.
In another FPGA-based programming paradigm, a high-level language (here ANSI C) can be used to program the FPGAs in a mode similar to that of the H-SIMD machine in terms of trying to minimize the effect of overheads. More specifically, a multi-threaded overlapping scheme is proposed to reduce as much as possible, or even completely hide, runtime FPGA reconfiguration overheads. Nevertheless, although the HLL-enabled reconfigurable machine allows software developers to customize FPGA functions easily, special architecture techniques are needed to achieve high-performance without significant penalty on area and clock frequency. Two important high-performance applications, matrix multiplication and image edge detection, are tested on the SRC-6 reconfigurable machine. The implemented algorithms are able to exploit the available data parallelism with independent functional units and application-specific cache support. Relevant performance and design tradeoffs are analyzed
The Design of a System Architecture for Mobile Multimedia Computers
This chapter discusses the system architecture of a portable computer, called Mobile Digital Companion, which provides support for handling multimedia applications energy efficiently. Because battery life is limited and battery weight is an important factor for the size and the weight of the Mobile Digital Companion, energy management plays a crucial role in the architecture. As the Companion must remain usable in a variety of environments, it has to be flexible and adaptable to various operating conditions. The Mobile Digital Companion has an unconventional architecture that saves energy by using system decomposition at different levels of the architecture and exploits locality of reference with dedicated, optimised modules. The approach is based on dedicated functionality and the extensive use of energy reduction techniques at all levels of system design. The system has an architecture with a general-purpose processor accompanied by a set of heterogeneous autonomous programmable modules, each providing an energy efficient implementation of dedicated tasks. A reconfigurable internal communication network switch exploits locality of reference and eliminates wasteful data copies
Embedded electronic systems driven by run-time reconfigurable hardware
Abstract
This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology âavailable through SRAM-based FPGA/SoC devicesâ aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation âsilicon area, processing time, complexity, flexibility, functional density, cost and power consumptionâ in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen
Esta tesis doctoral abarca el diseño de sistemas electrĂłnicos embebidos basados en tecnologĂa hardware dinĂĄmicamente reconfigurable âdisponible a travĂ©s de dispositivos lĂłgicos programables SRAM FPGA/SoCâ que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguraciĂłn que proporcione a la FPGA la capacidad de reconfiguraciĂłn dinĂĄmica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicaciĂłn particionada en tareas multiplexadas en tiempo y en espacio, optimizando asĂ su implementaciĂłn fĂsica âĂĄrea de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipadaâ comparada con otras alternativas basadas en hardware estĂĄtico (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalĂșa el flujo de diseño de dicha tecnologĂa a travĂ©s del prototipado de varias aplicaciones de ingenierĂa (sistemas de control, coprocesadores aritmĂ©ticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotaciĂłn en la industria.Resum
Aquesta tesi doctoral estĂ orientada al disseny de sistemes electrĂČnics empotrats basats en tecnologia hardware dinĂ micament reconfigurable âdisponible mitjançant dispositius lĂČgics programables SRAM FPGA/SoCâ que contribueixin a la millora de la qualitat de vida de la societat. Sâinvestiga lâarquitectura del sistema i del motor de reconfiguraciĂł que proporcioni a la FPGA la capacitat de reconfiguraciĂł dinĂ mica parcial dels seus recursos programables, amb lâobjectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicaciĂł particionada en tasques multiplexades en temps i en espai, optimizant aixĂ la seva implementaciĂł fĂsica âĂ rea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potĂšncia dissipadaâ comparada amb altres alternatives basades en hardware estĂ tic (MCU, DSP, GPU, ASSP, ASIC, etc.). SâevalĂșa el fluxe de disseny dâaquesta tecnologia a travĂ©s del prototipat de varies aplicacions dâenginyeria (sistemes de control, coprocessadors aritmĂštics, processadors dâimatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotaciĂł a la indĂșstria
Dynamically reconfigurable bio-inspired hardware
During the last several years, reconfigurable computing devices have experienced an impressive development in their resource availability, speed, and configurability. Currently, commercial FPGAs offer the possibility of self-reconfiguring by partially modifying their configuration bitstream, providing high architectural flexibility, while guaranteeing high performance. These configurability features have received special interest from computer architects: one can find several reconfigurable coprocessor architectures for cryptographic algorithms, image processing, automotive applications, and different general purpose functions. On the other hand we have bio-inspired hardware, a large research field taking inspiration from living beings in order to design hardware systems, which includes diverse topics: evolvable hardware, neural hardware, cellular automata, and fuzzy hardware, among others. Living beings are well known for their high adaptability to environmental changes, featuring very flexible adaptations at several levels. Bio-inspired hardware systems require such flexibility to be provided by the hardware platform on which the system is implemented. In general, bio-inspired hardware has been implemented on both custom and commercial hardware platforms. These custom platforms are specifically designed for supporting bio-inspired hardware systems, typically featuring special cellular architectures and enhanced reconfigurability capabilities; an example is their partial and dynamic reconfigurability. These aspects are very well appreciated for providing the performance and the high architectural flexibility required by bio-inspired systems. However, the availability and the very high costs of such custom devices make them only accessible to a very few research groups. Even though some commercial FPGAs provide enhanced reconfigurability features such as partial and dynamic reconfiguration, their utilization is still in its early stages and they are not well supported by FPGA vendors, thus making their use difficult to include in existing bio-inspired systems. In this thesis, I present a set of architectures, techniques, and methodologies for benefiting from the configurability advantages of current commercial FPGAs in the design of bio-inspired hardware systems. Among the presented architectures there are neural networks, spiking neuron models, fuzzy systems, cellular automata and random boolean networks. For these architectures, I propose several adaptation techniques for parametric and topological adaptation, such as hebbian learning, evolutionary and co-evolutionary algorithms, and particle swarm optimization. Finally, as case study I consider the implementation of bio-inspired hardware systems in two platforms: YaMoR (Yet another Modular Robot) and ROPES (Reconfigurable Object for Pervasive Systems); the development of both platforms having been co-supervised in the framework of this thesis
ìŹê”Źì±í ì°ì° ê”ŹìĄ°ë„Œ ìí ë¶ëììì ì§ì
íìë
ŒëŹž (ë°ìŹ)-- ììžëíê” ëíì : ì Ʞ·컎íší°êł”íë¶, 2014. 2. ì”êž°ì.With a huge increase in demand for various kinds of compute-intensive applications in electronic systems, researchers have focused on coarse-grained reconfigurable architectures because of their advantages: high performance and flexibility. Besides, supporting floating-point operations on coarse-grained reconfigurable architecture becomes essential as the increase of demands on various floating-point inclusive applications such as multimedia processing, 3D graphics, augmented reality, or object recognition.
This thesis presents FloRA, a coarse-grained reconfigurable architecture with floating-point support. Two-dimensional array of integer processing elements in FloRA is configured at run-time to perform floating-point operations as well as integer operations. More specifically, each floating-point operation is performed by two integer processing elements, one for mantissa and the other for exponent. Fabricated using 130nm process, the total area overhead due to additional hardware for floating-point operations is about 7.4% compared to the previous architecture which does not support floating-point operations. The fabricated chip runs at 125MHz clock frequency and 1.2V power supply. Experiments show 11.6x speedup on average compared to ARM9 with a vector-floating-point unit for integer-only benchmark programs as well as programs containing floating-point operations. Compared with other similar approaches including XPP and Butter, the proposed architecture shows much higher performance for integer applications, while maintaining about half the performance of Butter for floating-point applications.
This thesis also proposes novel techniques to enhance utilization of integer units for high-throughput floating-point operations on CGRA.
The approach to implementing floating-point operations on CGRA presented in this thesis enables floating-point functionality with less area overhead compared to the traditional approach of employing separate floating-point units (FPUs). However the total latency of a floating-point operation is larger than that of the traditional approach and the data dependency between split integer operations restricts further enhancement in terms of utilization of integer functional units in an operation. In order to overcome such inefficiency, two techniques are proposed in this thesis. One is overlapping two distinct floating-point operations, which increases the efficiency in terms of utilizations of integer functional units in the architecture. Free integer functional units in a floating-point operation can be used for another floating-point operation with this technique. The other is forwarding between two data-dependent floating-point operations, which decreases effective latency of the floating-point operations. The basic idea is to remove unnecessary calculations such as formatting which is normally done in between the two data-dependent floating-point operations. To implement the overlapping or forwarding, FSMs and control paths in each PE are modified and temporal/communication registers are added. Light-weight sub-module such as increment units and registers for intermediate values are added for releasing resource conflict.
Experiment is done with several arithmetic functions that are widely used in floating-point applications. The base architecture and the new architecture implementing the proposed technique are compared in terms of throughput and area overhead. The experimental result shows that the proposed technique increases the throughput by 33.9% on average with 20.9% of area overhead.Abstract i
Contents v
List of Figures ix
List of Tables xv
Chapter 1 INTRODUCTION 1
Chapter 2 TARGET ARCHITECTURE 7
2.1 Overall Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Reconfigurable Computing Module . . . . . . . . . . . . . . . . . 8
Chapter 3 DEGISN OF FLOATING-POINT OPERATIONS 15
3.1 Floating-point Numbers . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Representation of floating-point numbers . . . . . . . . . . 15
3.1.2 Floating-point operations . . . . . . . . . . . . . . . . . . . 19
3.2 FPU-PE Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Construction of FPU-PE Cluster . . . . . . . . . . . . . . . 20
3.2.2 Construction of Array of FPU-PE Clusters . . . . . . . . . 21
3.2.3 Comparing Different FPU-PE Clusters . . . . . . . . . . . 23
3.3 Implementation of Multi-Cycle Operations . . . . . . . . . . . . 26
3.4 Implementation of Floating-Point Operations . . . . . . . . . . . 30
3.5 Implementation of Floating-Point Operations Using Shared Modules . . . 32
Chapter 4 Chip Implementation 35
4.1 Specification of Chip Implementation . . . . . . . . . . . . . . . . 35
4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Experimantal Results . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.1 Performance Comparison . . . . . . . . . . . . . . . . . . . 39
4.3.2 Power Consumption Comparison . . . . . . . . . . . . . . 42
Chapter 5 Comparison with Other Architectures 45
5.1 Preparation for the comparison . . . . . . . . . . . . . . . . . . . 45
5.2 Comparison with PACT XPP . . . . . . . . . . . . . . . . . . . . . 47
5.3 Comparison with Butter Architecture . . . . . . . . . . . . . . . . 50
5.4 Implication of the proposed architecture . . . . . . . . . . . . . . 57
Chapter 6 Enhancement Techniques 63
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 Conventional Approach . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.1 Base Architecture . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.2 Utilization of Floating-Point Operations . . . . . . . . . . 65
6.3 Proposed Enhancement Techniques . . . . . . . . . . . . . . . . . 66
6.3.1 Overlapping Technique . . . . . . . . . . . . . . . . . . . . 66
6.3.2 Forwarding Technique . . . . . . . . . . . . . . . . . . . . . 71
6.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.4.1 Performance Comparison . . . . . . . . . . . . . . . . . . . 76
6.4.2 Hardware Cost of the Proposed Techniques . . . . . . . . . 77
6.4.3 Utilization Enhancement by the Proposed Techniques . . . 80
6.5 Comparison with Other Architecture . . . . . . . . . . . . . . . . 87
Chapter 7 Conclusion 93
Bibliography 95
ê”돞ìŽëĄ 103
ê°ìŹì êž 105Docto
Using Fine Grain Approaches for highly reliable Design of FPGA-based Systems in Space
Nowadays using SRAM based FPGAs in space missions is increasingly considered due to their flexibility and reprogrammability. A challenge is the devices sensitivity to radiation effects that increased with modern architectures due to smaller CMOS structures. This work proposes fault tolerance methodologies, that are based on a fine grain view to modern reconfigurable architectures. The focus is on SEU mitigation challenges in SRAM based FPGAs which can result in crucial situations
MURAC: A unified machine model for heterogeneous computers
Includes bibliographical referencesHeterogeneous computing enables the performance and energy advantages of multiple distinct processing architectures to be efficiently exploited within a single machine. These systems are capable of delivering large performance increases by matching the applications to architectures that are most suited to them. The Multiple Runtime-reconfigurable Architecture Computer (MURAC) model has been proposed to tackle the problems commonly found in the design and usage of these machines. This model presents a system-level approach that creates a clear separation of concerns between the system implementer and the application developer. The three key concepts that make up the MURAC model are a unified machine model, a unified instruction stream and a unified memory space. A simple programming model built upon these abstractions provides a consistent interface for interacting with the underlying machine to the user application. This programming model simplifies application partitioning between hardware and software and allows the easy integration of different execution models within the single control ow of a mixed-architecture application. The theoretical and practical trade-offs of the proposed model have been explored through the design of several systems. An instruction-accurate system simulator has been developed that supports the simulated execution of mixed-architecture applications. An embedded System-on-Chip implementation has been used to measure the overhead in hardware resources required to support the model, which was found to be minimal. An implementation of the model within an operating system on a tightly-coupled reconfigurable processor platform has been created. This implementation is used to extend the software scheduler to allow for the full support of mixed-architecture applications in a multitasking environment. Different scheduling strategies have been tested using this scheduler for mixed-architecture applications. The design and implementation of these systems has shown that a unified abstraction model for heterogeneous computers provides important usability benefits to system and application designers. These benefits are achieved through a consistent view of the multiple different architectures to the operating system and user applications. This allows them to focus on achieving their performance and efficiency goals by gaining the benefits of different execution models during runtime without the complex implementation details of the system-level synchronisation and coordination
A Survey of FPGA Optimization Methods for Data Center Energy Efficiency
This article provides a survey of academic literature about field
programmable gate array (FPGA) and their utilization for energy efficiency
acceleration in data centers. The goal is to critically present the existing
FPGA energy optimization techniques and discuss how they can be applied to such
systems. To do so, the article explores current energy trends and their
projection to the future with particular attention to the requirements set out
by the European Code of Conduct for Data Center Energy Efficiency. The article
then proposes a complete analysis of over ten years of research in energy
optimization techniques, classifying them by purpose, method of application,
and impacts on the sources of consumption. Finally, we conclude with the
challenges and possible innovations we expect for this sector.Comment: Accepted for publication in IEEE Transactions on Sustainable
Computin
- âŠ