9 research outputs found
Pipeline synthesis and optimization for reconfigurable custom computing machines
This paper presents a pipeline synthesis and optimization technique
for high-level language programming of reconfigurable Custom
Computing Machines. The circuit synthesis generates hardware
accelerators from a sequential program which exploit the
reconfigurable hardware\u27s parallelism. Program loops are transformed
to structural hardware specifications. The optimization algorithm
uses integer linear programming to balance and pipeline the
circuit\u27s registers. This global optimization determines the minimal
amount of flip-flops necessary for an optimal pipeline throughput.
It also considers the irregular flip-flop distribution on FPGAs.
Standard interface circuitry and a runtime system provide the
connection between the accelerator unit and its host computer. An
integrated compiler invokes the synthesis and produces a program
which downloads, calls and controls its hardware accelerators
automatically
Just-in-time Hardware generation for abstracted reconfigurable computing
This thesis addresses the use of reconfigurable hardware in computing platforms, in order to harness the performance benefits of dedicated hardware whilst maintaining the flexibility associated with software. Although the reconfigurable computing concept is not new, the low level nature of the supporting tools normally used, together with the consequent limited level of abstraction and resultant lack of backwards compatibility, has prevented the widespread adoption of this technology. In addition, bandwidth and architectural limitations, have seriously constrained the potential improvements in performance. A review of existing approaches and tools flows is conducted to highlight the current problems being faced in this field. The objective of the work presented in this thesis is to introduce a radically new approach to reconfigurable computing tool flows. The runtime based tool flow introduces complete abstraction between the application developer and the underlying hardware. This new technique eliminates the ease of use and backwards compatibility issues that have plagued the reconfigurable computing concept, and could pave the way for viable mainstream reconfigurable computing platforms. An easy to use, cycle accurate behavioural modelling system is also presented, which was used extensively during the early exploration of new concepts and architectures. Some performance improvements produced by the new reconfigurable computing tool flow, when applied to both a MIPS based embedded platform, and the Cray XDl, are also presented. These results are then analyzed and the hardware and software factors affecting the performance increases that were obtained are discussed, together with potential techniques that could be used to further increase the performance of the system. Lastly a heterogenous computing concept is proposed, in which, a computer system, containing multiple types of computational resource is envisaged, each having their own strengths and weaknesses (e.g. DSPs, CPUs, FPGAs). A revolutionary new method of fully exploiting the potential of such a system, whilst maintaining scalability, backwards compatibility, and ease of use is also presented
Compilación C a VHDL de códigos de bucles con reuso de datos
Durante este proyecto se ha desarrollado un compilador fuente a fuente, de nombre CtoVHDL, capaz de traducir bucles de C a VHDL. Con esta traducción se crea un acelerador hardware capaz de ejecutar el bucle en una FPGA. Los aceleradores hardware generados realizan simultáneamente el máximo número de operaciones posibles y, además, evitan los accesos a memoria efectuando un reuso de los datos
Generic low power reconfigurable distributed arithmetic processor
Higher performance, lower cost, increasingly minimizing integrated circuit components, and
higher packaging density of chips are ongoing goals of the microelectronic and computer
industry. As these goals are being achieved, however, power consumption and flexibility are
increasingly becoming bottlenecks that need to be addressed with the new technology in Very
Large-Scale Integrated (VLSI) design.
For modern systems, more energy is required to support the powerful computational capability
which accords with the increasing requirements, and these requirements cause the change of
standards not only in audio and video broadcasting but also in communication such as wireless
connection and network protocols. Powerful flexibility and low consumption are repellent, but
their combination in one system is the ultimate goal of designers.
A generic domain-specific low-power reconfigurable processor for the distributed
arithmetic algorithm is presented in this dissertation. This domain reconfigurable processor
features high efficiency in terms of area, power and delay, which approaches the
performance of an ASIC design, while retaining the flexibility of programmable platforms.
The architecture not only supports typical distributed arithmetic algorithms which can be
found in most still picture compression standards and video conferencing standards, but
also offers implementation ability for other distributed arithmetic algorithms found in
digital signal processing, telecommunication protocols and automatic control.
In this processor, a simple reconfigurable low power control unit is implemented with
good performance in area, power and timing. The generic characteristic of the architecture
makes it applicable for any small and medium size finite state machines which can be used
as control units to implement complex system behaviour and can be found in almost all
engineering disciplines. Furthermore, to map target applications efficiently onto the
proposed architecture, a new algorithm is introduced for searching for the best common
sharing terms set and it keeps the area and power consumption of the implementation at
low level. The software implementation of this algorithm is presented, which can be used
not only for the proposed architecture in this dissertation but also for all the
implementations with adder-based distributed arithmetic algorithms. In addition, some low
power design techniques are applied in the architecture, such as unsymmetrical design
style including unsymmetrical interconnection arranging, unsymmetrical PTBs selection
and unsymmetrical mapping basic computing units. All these design techniques achieve
extraordinary power consumption saving. It is believed that they can be extended to more
low power designs and architectures.
The processor presented in this dissertation can be used to implement complex, high
performance distributed arithmetic algorithms for communication and image processing
applications with low cost in area and power compared with the traditional
methods
Conception et mise en oeuvre d'un système de reconfiguration dynamique
Reconfiguration dynamique de FPGA pouvant être reconfigurés partiellement -- Reconfiguration dynamique utilisant des FPGA conventionnels -- Les outils -- Développement de noveaux FPGA ou de systèmes dynamiquement reconfigurables -- Études et analyses sur l'efficacité de la reconfiguration dynamique -- Description du fonctionnement de la carte avant l'implantation de la reconfiguration dynamique -- Le lien JTAG, protocole IEEE 1149.1 Boundary scan -- Fonctionalités désirées et difficultés prévues -- L'implantation matérielle -- L'implantation logicielle