3 research outputs found

    A study of memory-aware scheduling in message driven parallel programs

    Full text link
    Abstract—This paper presents a simple, but powerful memory-aware scheduling mechanism that adaptively schedules tasks in a message driven distributed-memory parallel program. The scheduler adapts its behavior whenever memory usage exceeds a threshold by scheduling tasks known to reduce memory usage. The usefulness of the scheduler and its low overhead are demonstrated in the context of an LU matrix factorization program. In the LU program, only a single additional line of code is required to make use of the new general-purpose memory-aware scheduling mechanism. Without memory-aware scheduling, the LU program can only run with small problem sizes, but with the new memory-aware scheduling, the program scales to larger problem sizes. I

    An Implementation of a Three Dimensional Computational Pipeline with Minimal Latency and Maximum Throughput for LU Factorization Using Field Programmable Gate Arrays

    Get PDF
    Traditionally, computationally intense algebraic functions such as LU factorization are solved using complex systems such as supercomputers, parallel processing systems, and non-dedicated computing clusters. While these solutions are adequate for some problems, they typically suffer from classic parallel processing issues such as communication overhead, complex scheduling algorithms, and cost. Moreover, they are not feasible for embedded applications. Extremely high performance solutions are sometimes implemented using costly, custom hardware such as Application Specific Integrated Circuits (ASICs). Unfortunately, the design, implementation, and verification of ASICs has become cost prohibitive and such solutions are only feasible if the end design is to be manufactured in very high volumes. As a result, many proposed architectures to solve specific problems lie dormant because they are simply too expensive to realize.In recent years, advancements in Field Programmable Gate Array (FPGA) technology allow engineers to map complex algorithms to logic gates while achieving performance similar to ASIC technology. This thesis demonstrates the feasibility of the implementation of a three dimensional pipeline designed to solve LU factorization using FPGAs based on an architecture proposed nearly 10 years ago when a technology to implement such an architecture either did not exist or was too costly to implement
    corecore