Search CORE

688 research outputs found

Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes

Author: Bosilca George
Faverge Mathieu
Lacoste Xavier
Ramet Pierre
Thibault Samuel
Publication venue
Publication date: 06/01/2014
Field of study

The ongoing hardware evolution exhibits an escalation in the number, as well as in the heterogeneity, of computing resources. The pressure to maintain reasonable levels of performance and portability forces application developers to leave the traditional programming paradigms and explore alternative solutions. PaStiX is a parallel sparse direct solver, based on a dynamic scheduler for modern hierarchical manycore architectures. In this paper, we study the benefits and limits of replacing the highly specialized internal scheduler of the PaStiX solver with two generic runtime systems: PaRSEC and StarPU. The tasks graph of the factorization step is made available to the two runtimes, providing them the opportunity to process and optimize its traversal in order to maximize the algorithm efficiency for the targeted hardware platform. A comparative study of the performance of the PaStiX solver on top of its native internal scheduler, PaRSEC, and StarPU frameworks, on different execution environments, is performed. The analysis highlights that these generic task-based runtimes achieve comparable results to the application-optimized embedded scheduler on homogeneous platforms. Furthermore, they are able to significantly speed up the solver on heterogeneous environments by taking advantage of the accelerators while hiding the complexity of their efficient manipulation from the programmer.Comment: Heterogeneity in Computing Workshop (2014

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Combining high productivity and high performance in image processing using single assignment C on multi-core CPUs and many-core GPUs

Author: Bernecky R.
Grelck C.
Guo J.
Haslinger P.
Korzeniowski F
Moser B.
Scholz S.-B.
Wieser V.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2012
Field of study

International Migration, Integration and Social Cohesion online publications

Parallelizing BZip2 : a Case Study in Multicore Software Engineering

Author: Jannesari Ali
Pankratius Victor
Tichy Walter F.
Publication venue: Universität Karlsruhe (TH)
Publication date: 01/01/2008
Field of study

KITopen

The Servet 3.0 benchmark suite: characterization of network performance degradation

Author: Expósito Roberto R.
González-Domínguez Jorge
López Taboada Guillermo
Martín María J.
Touriño Juan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

This is a post-peer-review, pre-copyedit version of an article published in Computers & Electrical Engineering. The final authenticated version is available online at: https://doi.org/10.1016/j.compeleceng.2013.08.012.[Abstract] Servet is a suite of benchmarks focused on extracting a set of parameters with high influence on the overall performance of multicore clusters. These parameters can be used to optimize the performance of parallel applications by adapting part of their behavior to the characteristics of the machine. Up to now the tool considered network bandwidth as constant and independent of the communication pattern. Nevertheless, the inter-node communication bandwidth decreases on modern large supercomputers depending on the number of cores per node that simultaneously access the network and on the distance between the communicating nodes. This paper describes two new benchmarks that improve Servet by characterizing the network performance degradation depending on these factors. This work also shows the experimental results of these benchmarks on a Cray XE6 supercomputer and some examples of how real parallel codes can be optimized by using the information about network degradation.Ministerio de Ciencia e Innovación; TIN2010-16735Ministerio de Educación; AP2008-01578Ministerio de Educación; AP2010-4348European Commision; HPC-Europa2 Programme; 22839

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Model-Based Performance Prediction for Concurrent Software on Multicore Architectures

Author: Frank Markus Kilian
Publication venue: KIT Scientific Publishing
Publication date: 01/01/2022
Field of study

Model-based performance prediction is a well-known concept to ensure the quality of software.Current approaches are based on a single-metric model, which leads to inaccurate predictions for modern architectures. This thesis presents a multi-strategies approach to extend performance prediction models to support multicore architectures.We implemented the strategies into Palladio and significantly increased the performance prediction power

KITopen

Directory of Open Access Books (DOAB)

Finding Best Compiler Options for Critical Software Using Parallel Algorithms

Author: E Alba
N Mladenovic
RM Stallman
Teodor Gabriel Crainic
TS Kumar
Publication venue
Publication date: 21/11/2018
Field of study

The efficiency of a software piece is a key factor for many systems. Real-time programs, critical software, device drivers, kernel OS functions and many other software pieces which are executed thousands or even millions of times per day require a very efficient execution. How this software is built can significantly affect the run time for these programs, since the context is that of compile-once/run-many. In this sense, the optimization flags used during the compilation time are a crucial element for this goal and they could make a big difference in the final execution time. In this paper, we use parallel metaheuristic techniques to automatically decide which optimization flags should be activated during the compilation on a set of benchmarking programs. The using the appropriate flag configuration is a complex combinatorial problem, but our approach is able to adapt the flag tuning to the characteristics of the software, improving the final run times with respect to other spread practicesThis research has been partially funded by the Spanish MINECO and FEDER projects (TIN2014-57341-R (http://moveon.lcc.uma.es), TIN2016-81766-REDT (http://cirti.es), and TIN2017-88213-R (http://6city.lcc.uma.es). It is also funded by Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Crossref

Repositorio Institucional Universidad de Málaga

ElasticSimMATE: a Fast and Accurate gem5 Trace-Driven Simulator for Multicore Systems

Author: Bruguier Florent
Gamatié Abdoulaye
Nocua Alejandro
Sassatelli Gilles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/07/2017
Field of study

International audienceMulticore system analysis requires efficient solutions for architectural parameter and scalability exploration. Long simulation time is the main drawback of current simulation approaches. In order to reduce the simulation time while keeping the accuracy levels, trace-driven simulation approaches have been developed. However, existing approaches do not allow multicore exploration or do not capture the behavior of multi-threaded programs. Based on the gem5 simulator, we developed a novel synchronization mechanism for multicore analysis based on the trace collection of synchronization events, instruction and dependencies. It allows efficient architectural parameter and scalability exploration with acceptable simulation speed and accuracy

Crossref

Efficient and Reasonable Object-Oriented Concurrency

Author: Meyer Bertrand
Nanz Sebastian
West Scott
Publication venue
Publication date: 27/07/2015
Field of study

Making threaded programs safe and easy to reason about is one of the chief difficulties in modern programming. This work provides an efficient execution model for SCOOP, a concurrency approach that provides not only data race freedom but also pre/postcondition reasoning guarantees between threads. The extensions we propose influence both the underlying semantics to increase the amount of concurrent execution that is possible, exclude certain classes of deadlocks, and enable greater performance. These extensions are used as the basis an efficient runtime and optimization pass that improve performance 15x over a baseline implementation. This new implementation of SCOOP is also 2x faster than other well-known safe concurrent languages. The measurements are based on both coordination-intensive and data-manipulation-intensive benchmarks designed to offer a mixture of workloads.Comment: Proceedings of the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE '15). ACM, 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

BigDataBench: a Big Data Benchmark Suite from Internet Services

Author: Gao Wanling
He Yongqiang
Jia Zhen
Li Xiaona
Lu Gang
Luo Chunjie
Qiu Bizhu
Shi Yingjie
Wang Lei
Yang Qiang
Zhan Jianfeng
Zhan Kent
Zhang Shujie
Zheng Chen
Zhu Yuqing
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/02/2014
Field of study

As architecture, systems, and data management communities pay greater attention to innovative big data systems and architectures, the pressure of benchmarking and evaluating these systems rises. Considering the broad use of big data systems, big data benchmarks must include diversity of data and workloads. Most of the state-of-the-art big data benchmarking efforts target evaluating specific types of applications or system software stacks, and hence they are not qualified for serving the purposes mentioned above. This paper presents our joint research efforts on this issue with several industrial partners. Our big data benchmark suite BigDataBench not only covers broad application scenarios, but also includes diverse and representative data sets. BigDataBench is publicly available from http://prof.ict.ac.cn/BigDataBench . Also, we comprehensively characterize 19 big data workloads included in BigDataBench with varying data inputs. On a typical state-of-practice processor, Intel Xeon E5645, we have the following observations: First, in comparison with the traditional benchmarks: including PARSEC, HPCC, and SPECCPU, big data applications have very low operation intensity; Second, the volume of data input has non-negligible impact on micro-architecture characteristics, which may impose challenges for simulation-based big data architecture research; Last but not least, corroborating the observations in CloudSuite and DCBench (which use smaller data inputs), we find that the numbers of L1 instruction cache misses per 1000 instructions of the big data applications are higher than in the traditional benchmarks; also, we find that L3 caches are effective for the big data applications, corroborating the observation in DCBench.Comment: 12 pages, 6 figures, The 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), February 15-19, 2014, Orlando, Florida, US

arXiv.org e-Print Archive

Crossref

HPCmatlab: A Framework for Fast Prototyping of Parallel Applications in Matlab

Author: Dave Mukul
Guo Xinchen
Sayeed Mohamed
Publication venue: The Author(s). Published by Elsevier B.V.
Publication date: 31/12/2016
Field of study

AbstractThe HPCmatlab framework has been developed for Distributed Memory Programming in Matlab/Octave using the Message Passing Interface (MPI). The communication routines in the MPI library are implemented using MEX wrappers. Point-to-point, collective as well as one-sided communication is supported. Benchmarking results show better performance than the Mathworks Distributed Computing Server. HPCmatlab has been used to successfully parallelize and speed up Matlab applications developed for scientific computing. The application results show good scalability, while preserving the ease of programmability. HPCmatlab also enables shared memory programming using Pthreads and Parallel I/O using the ADIOS package

Elsevier - Publisher Connector