28,746 research outputs found

    On The Parallelization Of Integer Polynomial Multiplication

    Get PDF
    With the advent of hardware accelerator technologies, multi-core processors and GPUs, much effort for taking advantage of those architectures by designing parallel algorithms has been made. To achieve this goal, one needs to consider both algebraic complexity and parallelism, plus making efficient use of memory traffic, cache, and reducing overheads in the implementations. Polynomial multiplication is at the core of many algorithms in symbolic computation such as real root isolation which will be our main application for now. In this thesis, we first investigate the multiplication of dense univariate polynomials with integer coefficients targeting multi-core processors. Some of the proposed methods are based on well-known serial classical algorithms, whereas a novel algorithm is designed to make efficient use of the targeted hardware. Experimentation confirms our theoretical analysis. Second, we report on the first implementation of subproduct tree techniques on many-core architectures. These techniques are basically another application of polynomial multiplication, but over a prime field. This technique is used in multi-point evaluation and interpolation of polynomials with coefficients over a prime field

    parMERASA Multi-Core Execution of Parallelised Hard Real-Time Applications Supporting Analysability

    Get PDF
    International audienceEngineers who design hard real-time embedded systems express a need for several times the performance available today while keeping safety as major criterion. A breakthrough in performance is expected by parallelizing hard real-time applications and running them on an embedded multi-core processor, which enables combining the requirements for high-performance with timing-predictable execution. parMERASA will provide a timing analyzable system of parallel hard real-time applications running on a scalable multicore processor. parMERASA goes one step beyond mixed criticality demands: It targets future complex control algorithms by parallelizing hard real-time programs to run on predictable multi-/many-core processors. We aim to achieve a breakthrough in techniques for parallelization of industrial hard real-time programs, provide hard real-time support in system software, WCET analysis and verification tools for multi-cores, and techniques for predictable multi-core designs with up to 64 cores

    Breadth-first search for social network graphs on heterogenous platforms

    Get PDF
    Breadth-First Search (BFS) is the core of many graph analysis algorithms, and it is useful in many problems including social network, computer network analysis, and data organization, but, due to its irregular behav- ior, its parallel implementation is very challenging. There are several approaches that implement efficient algorithms for BFS in multicore architectures and in Graphics Processors, but an efficient implementation of BFS for heterogeneous systems is even more complicated, as the task of distributing the work among the main cores and the accelerators becomes a big challenge. As part of this work, we have assessed different heterogenous shared-memory architectures (from high- end processors to embedded mobile processors, both composed by a multi-core CPU and an integrated GPU) and implemented different approaches to perform BFS. This work introduces three heterogeneous approaches for BFS: Selective, Concurrent, and Async. Contributions of this work includes both the analysis of BFS performance on Heterogenous platforms, as well as in depth analysis of social network graphs and its implications on the BFS algorithm. The results show that BFS is very input dependent, and that the structure of the graph is one of the prime factors to analyze in order to develop good and scalable algorithms. The results also show that heterogenous platforms can provide acceleration to even irregular algorithms, reaching speed-ups of 2.2x in the best case. It is also shown how the different system configurations and capabilities impact the performance and how the shared-memory system can reach bandwidth limitations that prevent performance improvements despite having higher utilization of the resources

    StochSoCs: High performance biocomputing simulations for large scale Systems Biology

    Full text link
    The stochastic simulation of large-scale biochemical reaction networks is of great importance for systems biology since it enables the study of inherently stochastic biological mechanisms at the whole cell scale. Stochastic Simulation Algorithms (SSA) allow us to simulate the dynamic behavior of complex kinetic models, but their high computational cost makes them very slow for many realistic size problems. We present a pilot service, named WebStoch, developed in the context of our StochSoCs research project, allowing life scientists with no high-performance computing expertise to perform over the internet stochastic simulations of large-scale biological network models described in the SBML standard format. Biomodels submitted to the service are parsed automatically and then placed for parallel execution on distributed worker nodes. The workers are implemented using multi-core and many-core processors, or FPGA accelerators that can handle the simulation of thousands of stochastic repetitions of complex biomodels, with possibly thousands of reactions and interacting species. Using benchmark LCSE biomodels, whose workload can be scaled on demand, we demonstrate linear speedup and more than two orders of magnitude higher throughput than existing serial simulators.Comment: The 2017 International Conference on High Performance Computing & Simulation (HPCS 2017), 8 page
    • …
    corecore