91 research outputs found
Rule-based Power-balanced VLIW Instruction Scheduling with Uncertainty
Abstract. Power-balanced instruction scheduling for Very Long Instruction Word (VLIW) processors is an optimization problem which requires a good instruction-level power model for the target processor. Conventionally, these power models are deterministic. However, in reality, there will always be some degree of imprecision involved. For power critical applications, it is desirable to find an optimal schedule which makes sure that the effects of these uncertainties could be minimized. The scheduling algorithm has to be computationally efficient in order to be practical for use in compilers. In this paper, we propose a rule based genetic algorithm to efficiently solve the optimization problem of power-balanced VLIW instruction scheduling with uncertainties in the power consumption model. We theoretically prove our rule-based genetic algorithm can produce as good optimal schedules as the existing algorithms proposed for this problem. Furthermore, its computational efficiency is significantly improved
Fast thread communication and synchronization mechanisms for a scalable single chip multiprocessor
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 159-163).by Stephen William Keckler.Ph.D
Just-in-time Hardware generation for abstracted reconfigurable computing
This thesis addresses the use of reconfigurable hardware in computing platforms, in order to harness the performance benefits of dedicated hardware whilst maintaining the flexibility associated with software. Although the reconfigurable computing concept is not new, the low level nature of the supporting tools normally used, together with the consequent limited level of abstraction and resultant lack of backwards compatibility, has prevented the widespread adoption of this technology. In addition, bandwidth and architectural limitations, have seriously constrained the potential improvements in performance. A review of existing approaches and tools flows is conducted to highlight the current problems being faced in this field. The objective of the work presented in this thesis is to introduce a radically new approach to reconfigurable computing tool flows. The runtime based tool flow introduces complete abstraction between the application developer and the underlying hardware. This new technique eliminates the ease of use and backwards compatibility issues that have plagued the reconfigurable computing concept, and could pave the way for viable mainstream reconfigurable computing platforms. An easy to use, cycle accurate behavioural modelling system is also presented, which was used extensively during the early exploration of new concepts and architectures. Some performance improvements produced by the new reconfigurable computing tool flow, when applied to both a MIPS based embedded platform, and the Cray XDl, are also presented. These results are then analyzed and the hardware and software factors affecting the performance increases that were obtained are discussed, together with potential techniques that could be used to further increase the performance of the system. Lastly a heterogenous computing concept is proposed, in which, a computer system, containing multiple types of computational resource is envisaged, each having their own strengths and weaknesses (e.g. DSPs, CPUs, FPGAs). A revolutionary new method of fully exploiting the potential of such a system, whilst maintaining scalability, backwards compatibility, and ease of use is also presented
High level compilation for gate reconfigurable architectures
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 205-215).A continuing exponential increase in the number of programmable elements is turning management of gate-reconfigurable architectures as "glue logic" into an intractable problem; it is past time to raise this abstraction level. The physical hardware in gate-reconfigurable architectures is all low level - individual wires, bit-level functions, and single bit registers - hence one should look to the fetch-decode-execute machinery of traditional computers for higher level abstractions. Ordinary computers have machine-level architectural mechanisms that interpret instructions - instructions that are generated by a high-level compiler. Efficiently moving up to the next abstraction level requires leveraging these mechanisms without introducing the overhead of machine-level interpretation. In this dissertation, I solve this fundamental problem by specializing architectural mechanisms with respect to input programs. This solution is the key to efficient compilation of high-level programs to gate reconfigurable architectures. My approach to specialization includes several novel techniques. I develop, with others, extensive bitwidth analyses that apply to registers, pointers, and arrays. I use pointer analysis and memory disambiguation to target devices with blocks of embedded memory. My approach to memory parallelization generates a spatial hierarchy that enables easier-to-synthesize logic state machines with smaller circuits and no long wires.(cont.) My space-time scheduling approach integrates the techniques of high-level synthesis with the static routing concepts developed for single-chip multiprocessors. Using DeepC, a prototype compiler demonstrating my thesis, I compile a new benchmark suite to Xilinx Virtex FPGAs. Resulting performance is comparable to a custom MIPS processor, with smaller area (40 percent on average), higher evaluation speeds (2.4x), and lower energy (18x) and energy-delay (45x). Specialization of advanced mechanisms results in additional speedup, scaling with hardware area, at the expense of power. For comparison, I also target IBM's standard cell SA-27E process and the RAW microprocessor. Results include sensitivity analysis to the different mechanisms specialized and a grand comparison between alternate targets.by Jonathan William Babb.Ph.D
TUKUTURI: eine dynamisch selbstrekonfigurierbare Softcore Prozessorarchitektur
Der Entwurf von Systemen zur digitalen Signalverarbeitung stellt den Entwickler
vor stetig wachsende Herausforderungen, die durch zunehmende Komplexität von
Anwendungen und die dafür benötigte Steigerung der Leistungsfähigkeit
eingebetteter Systeme verursacht werden. Ein weiterer Aspekt neben der
Leistungsfähigkeit ist die Flexibilität, die es erlaubt, Anwendungen und
Algorithmen auch nach Auslieferung eines Systems zu verändern.
Diese kann zum einen durch Verwendung von FPGAs erreicht werden, die eine
Rekonfiguration der Hardware ermöglichen. Zum anderen können prozessorbasierte
Systeme verwendet werden, die Flexibilität durch Programmierbarkeit
bereitstellen. Anwendungsspezifische Anpassungen der Prozessorarchitektur und
ein hohes Maß an paralleler Datenverarbeitung, beispielsweise durch
VLIW-Prozessoren, stellen dabei Mittel zum Erreichen hoher Leistungen
dar.
Das Thema dieser Arbeit ist die Untersuchung eines Entwurfsprozesses für
anwendungsspezifische Prozessorsysteme. Dieser basiert auf einem flexiblen
SIMD-VLIW-Prozessor, der in großem Umfang konfiguriert und durch zusätzliche
Hardwaremodule erweitert werden kann. Zur Exploration des Entwurfsraums werden
Werkzeuge zur Analyse von Prozessorkonfigurationen in realen Anwendungen
bereitgestellt sowie Methoden zur automatisierten Adaption der Architektur auf
Basis dieser Analyseergebnisse untersucht. Die Kompilierung von Anwendungen für
VLIW-Architekturen wird aufgrund der kombinatorischen Komplexität üblicherweise
mittels statischer Heuristiken durchgeführt, wodurch eine optimale Adaption an
flexible Architekturen erschwert werden kann. Daher werden hier dynamische
Methoden zur Codegenerierung, die auf evolutionären Algorithmen basieren,
untersucht.
Die Umsetzung der Architektur als Softcore auf einem FPGA bietet zusätzlich die
Möglichkeit der dynamischen Adaption der Hardware zur Laufzeit. Diese
Möglichkeiten und deren Einfluss auf die Leistungsfähigkeit der Prozessorsysteme
werden ebenfalls untersucht.
Die Analyse des Entwurfsprozesses in einer exemplarischen Anwendung der
bildbasierten Objekterkennung und der Vergleich mit Implementierungen auf einem
MIPS-Softcore bzw. VLIW-DSP zeigen die Eignung der Methoden zur Adaption von
Softcore-Prozessoren und der EA-basierten Kompilierung von Anwendungen. Die
dynamische Hardwarerekonfiguration zur Laufzeit kann bei reduziertem
Flächenbedarf für die Hardware ohne Leistungsverlust eingesetzt werden
- …