15 research outputs found

    DEMAND-DRIVEN EXECUTION USING FUTURE GATED SINGLE ASSIGNMENT FORM

    Get PDF
    This dissertation discusses a novel, previously unexplored execution model called Demand-Driven Execution (DDE), which executes programs starting from the outputs of the program, progressing towards the inputs of the program. This approach is significantly different from prior demand-driven reduction machines as it can execute a program written in an imperative language using the demand-driven paradigm while extracting both instruction and data level parallelism. The execution model relies on an executable Single Assignment Form which serves both as the internal representation of the compiler as well as the Instruction Set Architecture (ISA) of the machine. This work develops the instruction set architecture, the programming language pragmatics, and the microarchitecture for the demand-driven execution paradigm

    Implementation of a general purpose dataflow multiprocessor

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1988.GRSN 409671Includes bibliographical references (leaves 151-155).by Gregory Michael Papadopoulos.Ph.D

    Dynamic dependency analysis of ordinary programs

    Full text link

    Hardware design of task superscalar architecture

    Get PDF
    Exploiting concurrency to achieve greater performance is a difficult and important challenge for current high performance systems. Although the theory is plain, the complexity of traditional parallel programming models in most cases impedes the programmer to harvest performance. Several partitioning granularities have been proposed to better exploit concurrency at task granularity. In this sense, different dynamic software task management systems, such as task-based dataflow programming models, benefit dataflow principles to improve task-level parallelism and overcome the limitations of static task management systems. These models implicitly schedule computation and data and use tasks instead of instructions as a basic work unit, thereby relieving the programmer of explicitly managing parallelism. While these programming models share conceptual similarities with the well-known Out-of-Order superscalar pipelines (e.g., dynamic data dependency analysis and dataflow scheduling), they rely on software-based dependency analysis, which is inherently slow, and limits their scalability when there is fine-grained task granularity and a large amount of tasks. The aforementioned problem increases with the number of available cores. In order to keep all the cores busy and accelerate the overall application performance, it becomes necessary to partition it into more and smaller tasks. The task scheduling (i.e., creation and management of the execution of tasks) in software introduces overheads, and so becomes increasingly inefficient with the number of cores. In contrast, a hardware scheduling solution can achieve greater speed-ups as a hardware task scheduler requires fewer cycles than the software version to dispatch a task. The Task Superscalar is a hybrid dataflow/von-Neumann architecture that exploits the task level parallelism of the program. The Task Superscalar combines the effectiveness of Out-of-Order processors together with the task abstraction, and thereby provides an unified management layer for CMPs which effectively employs processors as functional units. The Task Superscalar has been implemented in software with limited parallelism and high memory consumption due to the nature of the software implementation. In this thesis, a Hardware Task Superscalar architecture is designed to be integrated in a future High Performance Computer with the ability to exploit fine-grained task parallelism. The main contributions of this thesis are: (1) a design of the operational flow of Task Superscalar architecture adapted and improved for hardware implementation, (2) a HDL prototype for latency exploration, (3) a full cycle-accurate simulator of the Hardware Task Superscalar (based on the previously obtained latencies), (4) full design space exploration of the Task Superscalar component configuration (number and size) for systems with different number of processing elements (cores), (5) comparison with a software implementation of a real task-based programming model runtime using real benchmarks, and (6) hardware resource usage exploration of the selected configurations.Explotar la concurrencia para conseguir un mejor rendimiento es un reto importante y dif铆cil para los sistemas de alto rendimiento. Aunque la teor铆a es sencilla, en muchos casos la complejidad de los modelos de programaci贸n paralela tradicionales impide al programador obtener un buen rendimiento. Se han propuesto diferentes granularidades de particionamiento de tareas para explotar mejor la concurrencia impl铆cita en las aplicaciones. En este sentido, diferentes sistemas software de manejo din谩mico de tareas utilizan los principios de ejecuci贸n "dataflow" para mejorar el paralelismo a nivel de tarea y superar el rendimiento de los sistemas de planificaci贸n est谩ticos. Estos modelos planfican la ejecuci贸n din谩micamente y utilizan tareas, en lugar de instrucciones, como unidad b谩sica de trabajo. De esta forma descargan al programador de tener que realizar la sincronizaci贸n de las tareas expl铆citamente en su programa. Aunque estos modelos de programaci贸n comparten muchas similitudes con los bien conocidos procesadores fuera de orden (como el an谩lisis din谩mico de dependencias y la ejecuci贸n en "dataflow"), dependen de un an谩lisis din谩mico software de las dependencias. Dicho an谩lisis es inherentemente lento y limita la escalabilidad cuando hay un gran n煤mero de tareas peque帽as. Los problemas antes mencionados se incrementan exponencialmente con el n煤mero de n煤cleos disponibles. Para conseguir mantener todos los n煤cleos ocupados y conseguir acelerar el rendimiento global de la aplicaci贸n se hace necesario particionarla en muchas tareas peque帽as. La gesti贸n de dichas tareas (es decir, su creaci贸n y distribuci贸n entre los n煤cleos) en software introduce sobrecostes, y por tanto resulta ineficiente conforme aumenta el n煤mero de n煤cleos. En contraposici贸n, un sistema hardware de planificaci贸n de tareas puede conseguir mejores rendimientos ya que requiere una menor latencia en la gesti贸n de las tareas. El Task Superscalar (TSS) es una arquitectura h铆brida dataflow/von-Neumann que explota el paralelismo a nivel de tareas de los programas. El TSS combina la efectividad de los procesadores fuera de orden con la abstracci贸n de tarea, y por tanto provee una capa unificada de gesti贸n para los CMPs que gestiona los n煤cleos como unidades funcionales. Previo al trabajo de esta tesis el Task Superscalar se hab铆a implementado en software con un paralelismo limitado y mucho consumo de memoria debido a las limitaciones inherentes de una implementaci贸n software. En esta tesis se dise帽ado una implementaci贸n hardware de la arquitectura Task Superscalar con capacidad para manejar muchas tareas de peque帽o tama帽o que es integrable en un futuro computador de altas prestaciones. As铆 pues, las contribuciones principales de esta tesis son: (1) el dise帽o de un flujo operacional de la arquitectura Task Superscalar adaptado y mejorado para su implementaci贸n hardware; (2) un prototipo HDL de dicho flujo para la exploraci贸n de las latencias asociadas a la implementaci贸n hardware; (3) un simulador ciclo a ciclo del dise帽o hardware basado en los resultados obtenidos en la implementaci贸n hardware; (4) una exploraci贸n completa del espacio de dise帽o de los componentes hardware (n煤mero y cantidad de m贸dulos, tama帽os de las memorias, etc.) para diferentes tama帽os de computadores (es decir, para diferentes cantidades de nucleos); (5) una comparaci贸n con la implementaci贸n software actual del mismo modelo de programaci贸n utilizando aplicaciones reales y; (6) una exploraci贸n de la utilizaci贸n de recursos hardware de las diferentes configuraciones seleccionadas

    Stream Objects : dynamically-segmented scalable media over the Internet

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.Includes bibliographical references (p. 90).by Steven Niemczyk.M.Eng
    corecore