42 research outputs found

    Experiments with parallel algorithms for combinatorial problems

    Get PDF
    In the last decade many models for parallel computation have been proposed and many parallel algorithms have been developed. However, few of these models have been realized and most of these algorithms are supposed to run on idealized, unrealistic parallel machines. The parallel machines constructed so far all use a simple model of parallel computation. Therefore, not every existing parallel machine is equally well suited for each type of algorithm. The adaptation of a certain algorithm to a specific parallel archi- tecture may severely increase the complexity of the algorithm or severely obscure its essence. Little is known about the performance of some standard combinatorial algorithms on existing parallel machines. In this paper we present computational results concerning the solution of knapsack, shortest paths and change-making problems by branch and bound, dynamic programming, and divide and conquer algorithms on the ICL-DAP (an SIMD computer), the Manchester dataflow machine and the CDC-CYBER-205 (a pipeline computer)

    Application specific dataflow machine construction for programming FPGAs via Lucent

    Get PDF
    Field Programmable Gate Arrays (FPGAs) have the potential to accelerate specific HPC codes. However even with the advent of High Level Synthesis (HLS), which enables FPGA programmers to write code in C or C++, programming such devices still requires considerable expertise. Much of this is due to the fact that these architectures are founded on dataflow rather than the Von Neumann abstraction of CPUs or GPUs. Thus programming FPGAs via imperative languages is not optimal and can result in very significant performance differences between the first and final versions of algorithms on dataflow architectures with the steps in between often not obvious and requiring considerable expertise. In this position paper we argue that languages built upon dataflow principals should be exploited to enable fast by construction codes for FPGAs, and this is akin to the programmer adopting the abstraction of developing a bespoke dataflow machine specialised for their application. It is our belief that much can be learnt from the generation of dataflow languages that gained popularity in the 1970s and 1980s around programming general purpose dataflow machines, and we introduce Lucent which is a modern derivative of Lucid, and used as a vehicle to explore this hypothesis. The idea behind Lucent is to provide high programmer productivity and performance for FPGAs by giving developers the most suitable language level abstractions. The focus of Lucent is very much to support the acceleration of HPC kernels, rather than the embedded electronics and circuit level, and we provide a brief overview of the language driven by examples.Comment: Accepted at the LATTE (Languages, Tools, and Techniques for Accelerator Design) ASPLOS worksho

    Nové knihy

    Get PDF

    Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

    Get PDF
    Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed

    Parallel functional programming for message-passing multiprocessors

    Get PDF
    We propose a framework for the evaluation of implicitly parallel functional programs on message passing multiprocessors with special emphasis on the issue of load bounding. The model is based on a new encoding of the lambda-calculus in Milner's pi-calculus and combines lazy evaluation and eager (parallel) evaluation in the same framework. The pi-calculus encoding serves as the specification of a more concrete compilation scheme mapping a simple functional language into a message passing, parallel program. We show how and under which conditions we can guarantee successful load bounding based on this compilation scheme. Finally we discuss the architectural requirements for a machine to support our model efficiently and we present a simple RISC-style processor architecture which meets those criteria
    corecore