85,185 research outputs found

    Worst-Case Execution Time Analysis of Predicated Architectures

    Get PDF
    The time-predictable design of computer architectures for the use in (hard) real-time systems is becoming more and more important, due to the increasing complexity of modern computer architectures. The design of predictable processor pipelines recently received considerable attention. The goal here is to find a trade-off between predictability and computing power. Branches and jumps are particularly problematic for high-performance processors. For one, branches are executed late in the pipeline. This either leads to high branch penalties (flushing) or complex software/hardware techniques (branch predictors). Another side-effect of branches is that they make it difficult to exploit instruction-level parallelism due to control dependencies. Predicated computer architectures allow to attach a predicate to the instructions in a program. An instruction is then only executed when the predicate evaluates to true and otherwise behaves like a simple nop instruction. Predicates can thus be used to convert control dependencies into data dependencies, which helps to address both of the aforementioned problems. A downside of predicated instructions is the precise worst-case execution time (WCET) analysis of programs making use of them. Predicated memory accesses, for instance, may or may not have an impact on the processor\u27s cache and thus need to be considered by the cache analysis. Predication potentially has an impact on all analysis phases of a WCET analysis tool. We thus explore a preprocessing step that explicitly unfolds the control-flow graph, which allows us to apply standard analyses that are themselves not aware of predication

    Empowering parallel computing with field programmable gate arrays

    Get PDF
    After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements

    Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes

    Get PDF
    The ongoing hardware evolution exhibits an escalation in the number, as well as in the heterogeneity, of computing resources. The pressure to maintain reasonable levels of performance and portability forces application developers to leave the traditional programming paradigms and explore alternative solutions. PaStiX is a parallel sparse direct solver, based on a dynamic scheduler for modern hierarchical manycore architectures. In this paper, we study the benefits and limits of replacing the highly specialized internal scheduler of the PaStiX solver with two generic runtime systems: PaRSEC and StarPU. The tasks graph of the factorization step is made available to the two runtimes, providing them the opportunity to process and optimize its traversal in order to maximize the algorithm efficiency for the targeted hardware platform. A comparative study of the performance of the PaStiX solver on top of its native internal scheduler, PaRSEC, and StarPU frameworks, on different execution environments, is performed. The analysis highlights that these generic task-based runtimes achieve comparable results to the application-optimized embedded scheduler on homogeneous platforms. Furthermore, they are able to significantly speed up the solver on heterogeneous environments by taking advantage of the accelerators while hiding the complexity of their efficient manipulation from the programmer.Comment: Heterogeneity in Computing Workshop (2014
    corecore