19 research outputs found

    MURAC: A unified machine model for heterogeneous computers

    Get PDF
    Includes bibliographical referencesHeterogeneous computing enables the performance and energy advantages of multiple distinct processing architectures to be efficiently exploited within a single machine. These systems are capable of delivering large performance increases by matching the applications to architectures that are most suited to them. The Multiple Runtime-reconfigurable Architecture Computer (MURAC) model has been proposed to tackle the problems commonly found in the design and usage of these machines. This model presents a system-level approach that creates a clear separation of concerns between the system implementer and the application developer. The three key concepts that make up the MURAC model are a unified machine model, a unified instruction stream and a unified memory space. A simple programming model built upon these abstractions provides a consistent interface for interacting with the underlying machine to the user application. This programming model simplifies application partitioning between hardware and software and allows the easy integration of different execution models within the single control ow of a mixed-architecture application. The theoretical and practical trade-offs of the proposed model have been explored through the design of several systems. An instruction-accurate system simulator has been developed that supports the simulated execution of mixed-architecture applications. An embedded System-on-Chip implementation has been used to measure the overhead in hardware resources required to support the model, which was found to be minimal. An implementation of the model within an operating system on a tightly-coupled reconfigurable processor platform has been created. This implementation is used to extend the software scheduler to allow for the full support of mixed-architecture applications in a multitasking environment. Different scheduling strategies have been tested using this scheduler for mixed-architecture applications. The design and implementation of these systems has shown that a unified abstraction model for heterogeneous computers provides important usability benefits to system and application designers. These benefits are achieved through a consistent view of the multiple different architectures to the operating system and user applications. This allows them to focus on achieving their performance and efficiency goals by gaining the benefits of different execution models during runtime without the complex implementation details of the system-level synchronisation and coordination

    Parallel architectures and runtime systems co-design for task-based programming models

    Get PDF
    The increasing parallelism levels in modern computing systems has extolled the need for a holistic vision when designing multiprocessor architectures taking in account the needs of the programming models and applications. Nowadays, system design consists of several layers on top of each other from the architecture up to the application software. Although this design allows to do a separation of concerns where it is possible to independently change layers due to a well-known interface between them, it is hampering future systems design as the Law of Moore reaches to an end. Current performance improvements on computer architecture are driven by the shrinkage of the transistor channel width, allowing faster and more power efficient chips to be made. However, technology is reaching physical limitations were the transistor size will not be able to be reduced furthermore and requires a change of paradigm in systems design. This thesis proposes to break this layered design, and advocates for a system where the architecture and the programming model runtime system are able to exchange information towards a common goal, improve performance and reduce power consumption. By making the architecture aware of runtime information such as a Task Dependency Graph (TDG) in the case of dataflow task-based programming models, it is possible to improve power consumption by exploiting the critical path of the graph. Moreover, the architecture can provide hardware support to create such a graph in order to reduce the runtime overheads and making possible the execution of fine-grained tasks to increase the available parallelism. Finally, the current status of inter-node communication primitives can be exposed to the runtime system in order to perform a more efficient communication scheduling, and also creates new opportunities of computation and communication overlap that were not possible before. An evaluation of the proposals introduced in this thesis is provided and a methodology to simulate and characterize the application behavior is also presented.El aumento del paralelismo proporcionado por los sistemas de c贸mputo modernos ha provocado la necesidad de una visi贸n hol铆stica en el dise帽o de arquitecturas multiprocesador que tome en cuenta las necesidades de los modelos de programaci贸n y las aplicaciones. Hoy en d铆a el dise帽o de los computadores consiste en diferentes capas de abstracci贸n con una interfaz bien definida entre ellas. Las limitaciones de esta aproximaci贸n junto con el fin de la ley de Moore limitan el potencial de los futuros computadores. La mayor铆a de las mejoras actuales en el dise帽o de los computadores provienen fundamentalmente de la reducci贸n del tama帽o del canal del transistor, lo cual permite chips m谩s r谩pidos y con un consumo eficiente sin apenas cambios fundamentales en el dise帽o de la arquitectura. Sin embargo, la tecnolog铆a actual est谩 alcanzando limitaciones f铆sicas donde no ser谩 posible reducir el tama帽o de los transistores motivando as铆 un cambio de paradigma en la construcci贸n de los computadores. Esta tesis propone romper este dise帽o en capas y abogar por un sistema donde la arquitectura y el sistema de tiempo de ejecuci贸n del modelo de programaci贸n sean capaces de intercambiar informaci贸n para alcanzar una meta com煤n: La mejora del rendimiento y la reducci贸n del consumo energ茅tico. Haciendo que la arquitectura sea consciente de la informaci贸n disponible en el modelo de programaci贸n, como puede ser el grafo de dependencias entre tareas en los modelos de programaci贸n dataflow, es posible reducir el consumo energ茅tico explotando el camino critico del grafo. Adem谩s, la arquitectura puede proveer de soporte hardware para crear este grafo con el objetivo de reducir el overhead de construir este grado cuando la granularidad de las tareas es demasiado fina. Finalmente, el estado de las comunicaciones entre nodos puede ser expuesto al sistema de tiempo de ejecuci贸n para realizar una mejor planificaci贸n de las comunicaciones y creando nuevas oportunidades de solapamiento entre c贸mputo y comunicaci贸n que no eran posibles anteriormente. Esta tesis aporta una evaluaci贸n de todas estas propuestas, as铆 como una metodolog铆a para simular y caracterizar el comportamiento de las aplicacionesPostprint (published version

    New FPGA design tools and architectures

    Get PDF

    A DYNAMIC HETEROGENEOUS MULTI-CORE ARCHITECTURE

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Dynamically reconfigurable architecture for embedded computer vision systems

    Get PDF
    The objective of this research work is to design, develop and implement a new architecture which integrates on the same chip all the processing levels of a complete Computer Vision system, so that the execution is efficient without compromising the power consumption while keeping a reduced cost. For this purpose, an analysis and classification of different mathematical operations and algorithms commonly used in Computer Vision are carried out, as well as a in-depth review of the image processing capabilities of current-generation hardware devices. This permits to determine the requirements and the key aspects for an efficient architecture. A representative set of algorithms is employed as benchmark to evaluate the proposed architecture, which is implemented on an FPGA-based system-on-chip. Finally, the prototype is compared to other related approaches in order to determine its advantages and weaknesses
    corecore