Search CORE

4 research outputs found

Parallelizing nested loops on the Intel Xeon Phi on the example of the dense WZ factorization

Author: Beata Bylina
Jarosław Bylina
Publication venue: 'Polish Information Processing Society PTI'
Publication date: 01/10/2016
Field of study

Crossref

Directory of Open Access Journals

Automatic Performance Optimization on Heterogeneous Computer Systems using Manycore Coprocessors

Author: Lai Chenggang
Publication venue: ScholarWorks@UARK
Publication date: 01/12/2018
Field of study

Emerging computer architectures and advanced computing technologies, such as Intel’s Many Integrated Core (MIC) Architecture and graphics processing units (GPU), provide a promising solution to employ parallelism for achieving high performance, scalability and low power consumption. As a result, accelerators have become a crucial part in developing supercomputers. Accelerators usually equip with different types of cores and memory. It will compel application developers to reach challenging performance goals. The added complexity has led to the development of task-based runtime systems, which allow complex computations to be expressed as task graphs, and rely on scheduling algorithms to perform load balancing between all resources of the platforms. Developing good scheduling algorithms, even on a single node, and analyzing them can thus have a very high impact on the performance of current HPC systems. Load balancing strategies, at different levels, will be critical to obtain an effective usage of the heterogeneous hardware and to reduce the impact of communication on energy and performance. Implementing efficient load balancing algorithms, able to manage heterogeneous hardware, can be a challenging task, especially when a parallel programming model for distributed memory architecture. In this paper, we presents several novel runtime approaches to determine the optimal data and task partition on heterogeneous platforms, targeting the Intel Xeon Phi accelerated heterogeneous systems

ScholarWorks@UARK

UARK (University of Arkansas )

The Design and Implementation of a High-Performance Polynomial System Solver

Author: Brandt Alexander
Publication venue: Scholarship@Western
Publication date: 02/08/2022
Field of study

This thesis examines the algorithmic and practical challenges of solving systems of polynomial equations. We discuss the design and implementation of triangular decomposition to solve polynomials systems exactly by means of symbolic computation. Incremental triangular decomposition solves one equation from the input list of polynomials at a time. Each step may produce several different components (points, curves, surfaces, etc.) of the solution set. Independent components imply that the solving process may proceed on each component concurrently. This so-called component-level parallelism is a theoretical and practical challenge characterized by irregular parallelism. Parallelism is not an algorithmic property but rather a geometrical property of the particular input system’s solution set. Despite these challenges, we have effectively applied parallel computing to triangular decomposition through the layering and cooperation of many parallel code regions. This parallel computing is supported by our generic object-oriented framework based on the dynamic multithreading paradigm. Meanwhile, the required polynomial algebra is sup- ported by an object-oriented framework for algebraic types which allows type safety and mathematical correctness to be determined at compile-time. Our software is implemented in C/C++ and have extensively tested the implementation for correctness and performance on over 3000 polynomial systems that have arisen in practice. The parallel framework has been re-used in the implementation of Hensel factorization as a parallel pipeline to compute roots of a polynomial with multivariate power series coeﬀicients. Hensel factorization is one step toward computing the non-trivial limit points of quasi-components

Scholarship@Western

Nonconvex optimization for improved exploitation of gradient sparsity in CT image reconstruction

Author: Chartrand Rick
Jørgensen Jakob Sauer
Pan Xiaochuan
Sidky Emil Y.
Publication venue: University of Southern California
Publication date: 01/01/2013
Field of study

Online Research Database In Technology