63,684 research outputs found

    Reducing Timing Interferences in Real-Time Applications Running on Multicore Architectures

    Get PDF
    We introduce a unified wcet analysis and scheduling framework for real-time applications deployed on multicore architectures. Our method does not follow a particular programming model, meaning that any piece of existing code (in particular legacy) can be re-used, and aims at reducing automatically the worst-case number of timing interferences between tasks. Our method is based on the notion of Time Interest Points (tips), which are instructions that can generate and/or suffer from timing interferences. We show how such points can be extracted from the binary code of applications and selected prior to performing the wcet analysis. We then represent real-time tasks as sequences of time intervals separated by tips, and schedule those tasks so that the overall makespan (including the potential timing penalties incurred by interferences) is minimized. This scheduling phase is performed using an Integer Linear Programming (ilp) solver. Preliminary results on state-of-the-art benchmarks show promising results and pave the way for future extensions of the model and optimizations

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    Diluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis

    Get PDF
    Traditional data centers are designed with a rigid architecture of fit-for-purpose servers that provision resources beyond the average workload in order to deal with occasional peaks of data. Heterogeneous data centers are pushing towards more cost-efficient architectures with better resource provisioning. In this paper we study the feasibility of using disaggregated architectures for intensive data applications, in contrast to the monolithic approach of server-oriented architectures. Particularly, we have tested a proactive network analysis system in which the workload demands are highly variable. In the context of the dReDBox disaggregated architecture, the results show that the overhead caused by using remote memory resources is significant, between 66\% and 80\%, but we have also observed that the memory usage is one order of magnitude higher for the stress case with respect to average workloads. Therefore, dimensioning memory for the worst case in conventional systems will result in a notable waste of resources. Finally, we found that, for the selected use case, parallelism is limited by memory. Therefore, using a disaggregated architecture will allow for increased parallelism, which, at the same time, will mitigate the overhead caused by remote memory.Comment: 8 pages, 6 figures, 2 tables, 32 references. Pre-print. The paper will be presented during the IEEE International Conference on High Performance Computing and Communications in Bangkok, Thailand. 18 - 20 December, 2017. To be published in the conference proceeding

    A Power-Aware Framework for Executing Streaming Programs on Networks-on-Chip

    Get PDF
    Nilesh Karavadara, Simon Folie, Michael Zolda, Vu Thien Nga Nguyen, Raimund Kirner, 'A Power-Aware Framework for Executing Streaming Programs on Networks-on-Chip'. Paper presented at the Int'l Workshop on Performance, Power and Predictability of Many-Core Embedded Systems (3PMCES'14), Dresden, Germany, 24-28 March 2014.Software developers are discovering that practices which have successfully served single-core platforms for decades do no longer work for multi-cores. Stream processing is a parallel execution model that is well-suited for architectures with multiple computational elements that are connected by a network. We propose a power-aware streaming execution layer for network-on-chip architectures that addresses the energy constraints of embedded devices. Our proof-of-concept implementation targets the Intel SCC processor, which connects 48 cores via a network-on- chip. We motivate our design decisions and describe the status of our implementation

    Programming MPSoC platforms: Road works ahead

    Get PDF
    This paper summarizes a special session on multicore/multi-processor system-on-chip (MPSoC) programming challenges. The current trend towards MPSoC platforms in most computing domains does not only mean a radical change in computer architecture. Even more important from a SW developer´s viewpoint, at the same time the classical sequential von Neumann programming model needs to be overcome. Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel SW. While several standards are established in the high-performance computing domain (e.g. OpenMP), it is clear that more innovations are required for successful\ud deployment of heterogeneous embedded MPSoC. On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool and code replacement strategy
    corecore