91 research outputs found

    Rule-based Power-balanced VLIW Instruction Scheduling with Uncertainty

    Get PDF
    Abstract. Power-balanced instruction scheduling for Very Long Instruction Word (VLIW) processors is an optimization problem which requires a good instruction-level power model for the target processor. Conventionally, these power models are deterministic. However, in reality, there will always be some degree of imprecision involved. For power critical applications, it is desirable to find an optimal schedule which makes sure that the effects of these uncertainties could be minimized. The scheduling algorithm has to be computationally efficient in order to be practical for use in compilers. In this paper, we propose a rule based genetic algorithm to efficiently solve the optimization problem of power-balanced VLIW instruction scheduling with uncertainties in the power consumption model. We theoretically prove our rule-based genetic algorithm can produce as good optimal schedules as the existing algorithms proposed for this problem. Furthermore, its computational efficiency is significantly improved

    Instruction scheduling in micronet-based asynchronous ILP processors

    Get PDF

    Architecture design of video processing systems on a chip

    Get PDF

    Performance Aspects of Synthesizable Computing Systems

    Get PDF

    Automatic synthesis of reconfigurable instruction set accelerators

    Get PDF

    Fast thread communication and synchronization mechanisms for a scalable single chip multiprocessor

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 159-163).by Stephen William Keckler.Ph.D

    Just-in-time Hardware generation for abstracted reconfigurable computing

    Get PDF
    This thesis addresses the use of reconfigurable hardware in computing platforms, in order to harness the performance benefits of dedicated hardware whilst maintaining the flexibility associated with software. Although the reconfigurable computing concept is not new, the low level nature of the supporting tools normally used, together with the consequent limited level of abstraction and resultant lack of backwards compatibility, has prevented the widespread adoption of this technology. In addition, bandwidth and architectural limitations, have seriously constrained the potential improvements in performance. A review of existing approaches and tools flows is conducted to highlight the current problems being faced in this field. The objective of the work presented in this thesis is to introduce a radically new approach to reconfigurable computing tool flows. The runtime based tool flow introduces complete abstraction between the application developer and the underlying hardware. This new technique eliminates the ease of use and backwards compatibility issues that have plagued the reconfigurable computing concept, and could pave the way for viable mainstream reconfigurable computing platforms. An easy to use, cycle accurate behavioural modelling system is also presented, which was used extensively during the early exploration of new concepts and architectures. Some performance improvements produced by the new reconfigurable computing tool flow, when applied to both a MIPS based embedded platform, and the Cray XDl, are also presented. These results are then analyzed and the hardware and software factors affecting the performance increases that were obtained are discussed, together with potential techniques that could be used to further increase the performance of the system. Lastly a heterogenous computing concept is proposed, in which, a computer system, containing multiple types of computational resource is envisaged, each having their own strengths and weaknesses (e.g. DSPs, CPUs, FPGAs). A revolutionary new method of fully exploiting the potential of such a system, whilst maintaining scalability, backwards compatibility, and ease of use is also presented

    High level compilation for gate reconfigurable architectures

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 205-215).A continuing exponential increase in the number of programmable elements is turning management of gate-reconfigurable architectures as "glue logic" into an intractable problem; it is past time to raise this abstraction level. The physical hardware in gate-reconfigurable architectures is all low level - individual wires, bit-level functions, and single bit registers - hence one should look to the fetch-decode-execute machinery of traditional computers for higher level abstractions. Ordinary computers have machine-level architectural mechanisms that interpret instructions - instructions that are generated by a high-level compiler. Efficiently moving up to the next abstraction level requires leveraging these mechanisms without introducing the overhead of machine-level interpretation. In this dissertation, I solve this fundamental problem by specializing architectural mechanisms with respect to input programs. This solution is the key to efficient compilation of high-level programs to gate reconfigurable architectures. My approach to specialization includes several novel techniques. I develop, with others, extensive bitwidth analyses that apply to registers, pointers, and arrays. I use pointer analysis and memory disambiguation to target devices with blocks of embedded memory. My approach to memory parallelization generates a spatial hierarchy that enables easier-to-synthesize logic state machines with smaller circuits and no long wires.(cont.) My space-time scheduling approach integrates the techniques of high-level synthesis with the static routing concepts developed for single-chip multiprocessors. Using DeepC, a prototype compiler demonstrating my thesis, I compile a new benchmark suite to Xilinx Virtex FPGAs. Resulting performance is comparable to a custom MIPS processor, with smaller area (40 percent on average), higher evaluation speeds (2.4x), and lower energy (18x) and energy-delay (45x). Specialization of advanced mechanisms results in additional speedup, scaling with hardware area, at the expense of power. For comparison, I also target IBM's standard cell SA-27E process and the RAW microprocessor. Results include sensitivity analysis to the different mechanisms specialized and a grand comparison between alternate targets.by Jonathan William Babb.Ph.D

    TUKUTURI: eine dynamisch selbstrekonfigurierbare Softcore Prozessorarchitektur

    Get PDF
    Der Entwurf von Systemen zur digitalen Signalverarbeitung stellt den Entwickler vor stetig wachsende Herausforderungen, die durch zunehmende Komplexität von Anwendungen und die dafür benötigte Steigerung der Leistungsfähigkeit eingebetteter Systeme verursacht werden. Ein weiterer Aspekt neben der Leistungsfähigkeit ist die Flexibilität, die es erlaubt, Anwendungen und Algorithmen auch nach Auslieferung eines Systems zu verändern. Diese kann zum einen durch Verwendung von FPGAs erreicht werden, die eine Rekonfiguration der Hardware ermöglichen. Zum anderen können prozessorbasierte Systeme verwendet werden, die Flexibilität durch Programmierbarkeit bereitstellen. Anwendungsspezifische Anpassungen der Prozessorarchitektur und ein hohes Maß an paralleler Datenverarbeitung, beispielsweise durch VLIW-Prozessoren, stellen dabei Mittel zum Erreichen hoher Leistungen dar. Das Thema dieser Arbeit ist die Untersuchung eines Entwurfsprozesses für anwendungsspezifische Prozessorsysteme. Dieser basiert auf einem flexiblen SIMD-VLIW-Prozessor, der in großem Umfang konfiguriert und durch zusätzliche Hardwaremodule erweitert werden kann. Zur Exploration des Entwurfsraums werden Werkzeuge zur Analyse von Prozessorkonfigurationen in realen Anwendungen bereitgestellt sowie Methoden zur automatisierten Adaption der Architektur auf Basis dieser Analyseergebnisse untersucht. Die Kompilierung von Anwendungen für VLIW-Architekturen wird aufgrund der kombinatorischen Komplexität üblicherweise mittels statischer Heuristiken durchgeführt, wodurch eine optimale Adaption an flexible Architekturen erschwert werden kann. Daher werden hier dynamische Methoden zur Codegenerierung, die auf evolutionären Algorithmen basieren, untersucht. Die Umsetzung der Architektur als Softcore auf einem FPGA bietet zusätzlich die Möglichkeit der dynamischen Adaption der Hardware zur Laufzeit. Diese Möglichkeiten und deren Einfluss auf die Leistungsfähigkeit der Prozessorsysteme werden ebenfalls untersucht. Die Analyse des Entwurfsprozesses in einer exemplarischen Anwendung der bildbasierten Objekterkennung und der Vergleich mit Implementierungen auf einem MIPS-Softcore bzw. VLIW-DSP zeigen die Eignung der Methoden zur Adaption von Softcore-Prozessoren und der EA-basierten Kompilierung von Anwendungen. Die dynamische Hardwarerekonfiguration zur Laufzeit kann bei reduziertem Flächenbedarf für die Hardware ohne Leistungsverlust eingesetzt werden
    corecore