28,746 research outputs found
On The Parallelization Of Integer Polynomial Multiplication
With the advent of hardware accelerator technologies, multi-core processors and GPUs, much effort for taking advantage of those architectures by designing parallel algorithms has been made. To achieve this goal, one needs to consider both algebraic complexity and parallelism, plus making efficient use of memory traffic, cache, and reducing overheads in the implementations.
Polynomial multiplication is at the core of many algorithms in symbolic computation such as real root isolation which will be our main application for now.
In this thesis, we first investigate the multiplication of dense univariate polynomials with integer coefficients targeting multi-core processors. Some of the proposed methods are based on well-known serial classical algorithms, whereas a novel algorithm is designed to make efficient use of the targeted hardware. Experimentation confirms our theoretical analysis.
Second, we report on the first implementation of subproduct tree techniques on many-core architectures. These techniques are basically another application of polynomial multiplication, but over a prime field. This technique is used in multi-point evaluation and interpolation of polynomials with coefficients over a prime field
parMERASA Multi-Core Execution of Parallelised Hard Real-Time Applications Supporting Analysability
International audienceEngineers who design hard real-time embedded systems express a need for several times the performance available today while keeping safety as major criterion. A breakthrough in performance is expected by parallelizing hard real-time applications and running them on an embedded multi-core processor, which enables combining the requirements for high-performance with timing-predictable execution. parMERASA will provide a timing analyzable system of parallel hard real-time applications running on a scalable multicore processor. parMERASA goes one step beyond mixed criticality demands: It targets future complex control algorithms by parallelizing hard real-time programs to run on predictable multi-/many-core processors. We aim to achieve a breakthrough in techniques for parallelization of industrial hard real-time programs, provide hard real-time support in system software, WCET analysis and verification tools for multi-cores, and techniques for predictable multi-core designs with up to 64 cores
Breadth-first search for social network graphs on heterogenous platforms
Breadth-First Search (BFS) is the core of many graph analysis algorithms, and it is useful in many problems including social network, computer network analysis, and data organization, but, due to its irregular behav- ior, its parallel implementation is very challenging. There are several approaches that implement efficient algorithms for BFS in multicore architectures and in Graphics Processors, but an efficient implementation of BFS for heterogeneous systems is even more complicated, as the task of distributing the work among the main cores and the accelerators becomes a big challenge.
As part of this work, we have assessed different heterogenous shared-memory architectures (from high- end processors to embedded mobile processors, both composed by a multi-core CPU and an integrated GPU) and implemented different approaches to perform BFS. This work introduces three heterogeneous approaches for BFS: Selective, Concurrent, and Async. Contributions of this work includes both the analysis of BFS performance on Heterogenous platforms, as well as in depth analysis of social network graphs and its implications on the BFS algorithm.
The results show that BFS is very input dependent, and that the structure of the graph is one of the prime factors to analyze in order to develop good and scalable algorithms. The results also show that heterogenous platforms can provide acceleration to even irregular algorithms, reaching speed-ups of 2.2x in the best case. It is also shown how the different system configurations and capabilities impact the performance and how the shared-memory system can reach bandwidth limitations that prevent performance improvements despite having higher utilization of the resources
StochSoCs: High performance biocomputing simulations for large scale Systems Biology
The stochastic simulation of large-scale biochemical reaction networks is of
great importance for systems biology since it enables the study of inherently
stochastic biological mechanisms at the whole cell scale. Stochastic Simulation
Algorithms (SSA) allow us to simulate the dynamic behavior of complex kinetic
models, but their high computational cost makes them very slow for many
realistic size problems. We present a pilot service, named WebStoch, developed
in the context of our StochSoCs research project, allowing life scientists with
no high-performance computing expertise to perform over the internet stochastic
simulations of large-scale biological network models described in the SBML
standard format. Biomodels submitted to the service are parsed automatically
and then placed for parallel execution on distributed worker nodes. The workers
are implemented using multi-core and many-core processors, or FPGA accelerators
that can handle the simulation of thousands of stochastic repetitions of
complex biomodels, with possibly thousands of reactions and interacting
species. Using benchmark LCSE biomodels, whose workload can be scaled on
demand, we demonstrate linear speedup and more than two orders of magnitude
higher throughput than existing serial simulators.Comment: The 2017 International Conference on High Performance Computing &
Simulation (HPCS 2017), 8 page
- …