4 research outputs found

    Automatic Performance Optimization on Heterogeneous Computer Systems using Manycore Coprocessors

    Get PDF
    Emerging computer architectures and advanced computing technologies, such as Intel’s Many Integrated Core (MIC) Architecture and graphics processing units (GPU), provide a promising solution to employ parallelism for achieving high performance, scalability and low power consumption. As a result, accelerators have become a crucial part in developing supercomputers. Accelerators usually equip with different types of cores and memory. It will compel application developers to reach challenging performance goals. The added complexity has led to the development of task-based runtime systems, which allow complex computations to be expressed as task graphs, and rely on scheduling algorithms to perform load balancing between all resources of the platforms. Developing good scheduling algorithms, even on a single node, and analyzing them can thus have a very high impact on the performance of current HPC systems. Load balancing strategies, at different levels, will be critical to obtain an effective usage of the heterogeneous hardware and to reduce the impact of communication on energy and performance. Implementing efficient load balancing algorithms, able to manage heterogeneous hardware, can be a challenging task, especially when a parallel programming model for distributed memory architecture. In this paper, we presents several novel runtime approaches to determine the optimal data and task partition on heterogeneous platforms, targeting the Intel Xeon Phi accelerated heterogeneous systems

    The Design and Implementation of a High-Performance Polynomial System Solver

    Get PDF
    This thesis examines the algorithmic and practical challenges of solving systems of polynomial equations. We discuss the design and implementation of triangular decomposition to solve polynomials systems exactly by means of symbolic computation. Incremental triangular decomposition solves one equation from the input list of polynomials at a time. Each step may produce several different components (points, curves, surfaces, etc.) of the solution set. Independent components imply that the solving process may proceed on each component concurrently. This so-called component-level parallelism is a theoretical and practical challenge characterized by irregular parallelism. Parallelism is not an algorithmic property but rather a geometrical property of the particular input system’s solution set. Despite these challenges, we have effectively applied parallel computing to triangular decomposition through the layering and cooperation of many parallel code regions. This parallel computing is supported by our generic object-oriented framework based on the dynamic multithreading paradigm. Meanwhile, the required polynomial algebra is sup- ported by an object-oriented framework for algebraic types which allows type safety and mathematical correctness to be determined at compile-time. Our software is implemented in C/C++ and have extensively tested the implementation for correctness and performance on over 3000 polynomial systems that have arisen in practice. The parallel framework has been re-used in the implementation of Hensel factorization as a parallel pipeline to compute roots of a polynomial with multivariate power series coefficients. Hensel factorization is one step toward computing the non-trivial limit points of quasi-components
    corecore