42 research outputs found
Experiments with parallel algorithms for combinatorial problems
In the last decade many models for parallel computation have been proposed and many
parallel algorithms have been developed. However, few of these models have been realized
and most of these algorithms are supposed to run on idealized, unrealistic parallel machines.
The parallel machines constructed so far all use a simple model of parallel computation.
Therefore, not every existing parallel machine is equally well suited for each type of
algorithm. The adaptation of a certain algorithm to a specific parallel archi- tecture may
severely increase the complexity of the algorithm or severely obscure its essence.
Little is known about the performance of some standard combinatorial algorithms on
existing parallel machines. In this paper we present computational results concerning the
solution of knapsack, shortest paths and change-making problems by branch and bound,
dynamic programming, and divide and conquer algorithms on the ICL-DAP (an SIMD computer),
the Manchester dataflow machine and the CDC-CYBER-205 (a pipeline computer)
Application specific dataflow machine construction for programming FPGAs via Lucent
Field Programmable Gate Arrays (FPGAs) have the potential to accelerate
specific HPC codes. However even with the advent of High Level Synthesis (HLS),
which enables FPGA programmers to write code in C or C++, programming such
devices still requires considerable expertise. Much of this is due to the fact
that these architectures are founded on dataflow rather than the Von Neumann
abstraction of CPUs or GPUs. Thus programming FPGAs via imperative languages is
not optimal and can result in very significant performance differences between
the first and final versions of algorithms on dataflow architectures with the
steps in between often not obvious and requiring considerable expertise.
In this position paper we argue that languages built upon dataflow principals
should be exploited to enable fast by construction codes for FPGAs, and this is
akin to the programmer adopting the abstraction of developing a bespoke
dataflow machine specialised for their application. It is our belief that much
can be learnt from the generation of dataflow languages that gained popularity
in the 1970s and 1980s around programming general purpose dataflow machines,
and we introduce Lucent which is a modern derivative of Lucid, and used as a
vehicle to explore this hypothesis. The idea behind Lucent is to provide high
programmer productivity and performance for FPGAs by giving developers the most
suitable language level abstractions. The focus of Lucent is very much to
support the acceleration of HPC kernels, rather than the embedded electronics
and circuit level, and we provide a brief overview of the language driven by
examples.Comment: Accepted at the LATTE (Languages, Tools, and Techniques for
Accelerator Design) ASPLOS worksho
Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed
Parallel functional programming for message-passing multiprocessors
We propose a framework for the evaluation of implicitly parallel functional programs on message passing multiprocessors with special emphasis on the issue of load bounding. The model is based on a new encoding of the lambda-calculus in Milner's pi-calculus and combines lazy evaluation and eager (parallel) evaluation in the same framework. The pi-calculus encoding serves as the specification of a more concrete compilation scheme mapping a simple functional language into a message passing, parallel program. We show how and under which conditions we can guarantee successful load bounding based on this compilation scheme. Finally we discuss the architectural requirements for a machine to support our model efficiently and we present a simple RISC-style processor architecture which meets those criteria