Search CORE

13 research outputs found

Energy-efficiency evaluation of Intel KNL for HPC workloads

Author: Calore E.
Gabbana A.
Schifano S. F.
Tripiccione R.
Publication venue: 'IOS Press'
Publication date: 01/01/2018
Field of study

Energy consumption is increasingly becoming a limiting factor to the design of faster large-scale parallel systems, and development of energy-efficient and energy-aware applications is today a relevant issue for HPC code-developer communities. In this work we focus on energy performance of the Knights Landing (KNL) Xeon Phi, the latest many-core architecture processor introduced by Intel into the HPC market. We take into account the 64-core Xeon Phi 7230, and analyze its energy performance using both the on-chip MCDRAM and the regular DDR4 system memory as main storage for the application data-domain. As a benchmark application we use a Lattice Boltzmann code heavily optimized for this architecture and implemented using different memory data layouts to store its lattice. We assessthen the energy consumption using different memory data-layouts, kind of memory (DDR4 or MCDRAM) and number of threads per core

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Ferrara

A PETSc parallel-in-time solver based on MGRIT algorithm

Author: Arcucci
Boccia
Brandt
Briggs
Carracciuolo
D'Amore
D'amore
D'Amore
D'Amore
Falgout
Falgout
Falgout
Gander
Gander
Ghysels
Grebhahn
Laccetti
Laccetti
Lapegna
Lions
Murli
Murli
Rao
Shchepetkin
Tjaden
Publication venue
Publication date: 01/01/2018
Field of study

We address the development of a modular implementation of the MGRIT (MultiGrid-In-Time) algorithm to solve linear and nonlinear systems that arise from the discretization of evolutionary models with a parallel-in-time approach in the context of the PETSc (the Portable, Extensible Toolkit for Scientific computing) library. Our aim is to give the opportunity of predicting the performance gain achievable when using the MGRIT approach instead of the Time Stepping integrator (TS). To this end, we analyze the performance parameters of the algorithm that provide a-priori the best number of processing elements and grid levels to use to address the scaling of MGRIT, regarded as a parallel iterative algorithm proceeding along the time dimensio

Archivio della ricerca - Università degli studi di Napoli Federico II

Crossref

Storage QoS provisioning for execution programming of data-intensive applications

Author: Renata Słota
Publication venue
Publication date: 01/01/2012
Field of study

Abstract. In this paper a method for execution programming of data-intensive applications is presented. The method is based on storage Quality of Service (SQoS) provisioning. SQoS provisioning uses the semantic based storage monitoring based on a storage resources model and a storage performance management. Test results show the gain for the execution time when using the QStorMan toolkit which implements the presented method. Taking into account the SQoS provisioning opportunity on the one hand, and the increasingly growing user demands on the other hand, we believe that the execution programming of data-intensive applications can bring a new quality into the application execution

CiteSeerX

Directory of Open Access Journals

Massively parallel lattice–Boltzmann codes on large GPU clusters

Author: Calore Enrico
Gabbana Alessandro
Kraus J.
Pellegrini Elisa
Schifano Sebastiano Fabio
Tripiccione Raffaele
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

This paper describes a massively parallel code for a state-of-the art thermal lattice–Boltzmann method. Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs. Versions of this code have been already used for large-scale studies of convective turbulence. GPUs are becoming increasingly popular in HPC applications, as they are able to deliver higher performance than traditional processors. Writing efficient programs for large clusters is not an easy task as codes must adapt to increasingly parallel architectures, and the overheads of node-to-node communications must be properly handled. We describe the structure of our code, discussing several key design choices that were guided by theoretical models of performance and experimental benchmarks. We present an extensive set of performance measurements and identify the corresponding main bottlenecks; finally we compare the results of our GPU code with those measured on other currently available high performance processors. Our results are a production-grade code able to deliver a sustained performance of several tens of Tflops as well as a design and optimization methodology that can be used for the development of other high performance applications for computational physics

Archivio istituzionale della ricerca - Università di Ferrara

On the Koszul cohomology of canonical and Prym-canonical binary curves

Author: Colombo Elisabetta
Frediani Paola
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we study Koszul cohomology and the Green and Prym-Green conjectures for canonical and Prym-canonical binary curves. We prove that if property

N_p

holds for a canonical or a Prym-canonical binary curve of genus

g

then it holds for a generic canonical or Prym-canonical binary curve of genus

g+1

. We also verify the Green and Prym-Green conjectures for generic canonical and Prym-canonical binary curves of low genus (

6\leq g\leq 15

g\neq 8

for Prym-canonical and

3\leq g\leq 12

for canonical).Comment: Final version. To appear in the Bulletin of the London Mathematical Societ

arXiv.org e-Print Archive

CiteSeerX

AIR Universita degli studi di Milano

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

Pseudo-Random Number Generators for Vector Processors and Multicore Processors

Author: Fog Agner
Publication venue
Publication date: 01/01/2015
Field of study

Large scale Monte Carlo applications need a good pseudo-random number generator capable of utilizing both the vector processing capabilities and multiprocessing capabilities of modern computers in order to get the maximum performance. The requirements for such a generator are discussed. New ways of avoiding overlapping subsequences by combining two generators are proposed. Some fundamental philosophical problems in proving independence of random streams are discussed. Remedies for hitherto ignored quantization errors are offered. An open source C++ implementation is provided for a generator that meets these needs

Digital Commons@Wayne State University

Online Research Database In Technology

Parallelizations of TFETI-1 coarse problem

Author: Kružík Jakub
Publication venue: Vysoká škola báňská - Technická univerzita Ostrava
Publication date: 01/01/2016
Field of study

Import 03/11/2016Metody založené na FETI, používané pro řešení eliptických parcialních diferencialních rovnic, představují velmi úspěšnou třídu metod dekompozice oblasti, které se používají pro paralelizaci dobře známých metod konečných prvků. Původní problém ve FETI methodách je rozdělen na menší problémy definované na podoblastech. Díky tomu, že se podoblasti nepřekrývají, můžeme menší problémy nezávisle na sobě řešit paralelně. Počet podoblastí cheme zvyšovat tak, aby se menší problémy řešily rychleji. To ale zároveň vede k růstu velikosti hrubého problému. Pro složité problémy je navíc potřeba řešit hrubý problém mnohokrát. Díky tomu je potřeba najít řešení hrubého problému co nejefektivněji. Tato práce se zabývá paralelnímy strategiemi řešení hrubého problému TFETI--1 metody.The FETI based methods, used for the solution of elliptical partial differential equations, form a highly successful class of domain decomposition methods used for parallelization of well known finite element methods. In the FETI methods we partition the original problem into smaller problems defined on subdomains. Since the subdomains are non-overlapping we can naturally solve the smaller problems independently in parallel. We want to increase the number of subdomains so that the smaller problems are solved faster. This however leads to the increase in the size of the coarse problem. Moreover, for complex problems, the number of coarse problem solutions needed can be very high. Therefore, it is important to find the solution of the coarse problem efficiently. This thesis deals with parallelization strategies of the TFETI--1 coarse problem.470 - Katedra aplikované matematikyvýborn

DSpace at VSB Technical University of Ostrava

Sustainability performance assessment of municipal solid waste management utilising aggregated indicators approach

Author: Lee Cindy Ik Sing
Publication venue
Publication date: 01/01/2018
Field of study

There is a need for effective and sustainable municipal solid waste (MSW) management system to be implemented in Malaysia, especially in the urban areas. Indicators have often been chosen as a tool to evaluate the performances of the current MSW management system in Malaysia. From the literature reviewed, no index was found to be similar with the one being proposed by this study. This study was conducted to produce a set of indicators that evaluate the MSW management system throughout the entire life cycle. The development of these indicators involved intensive literature reviews, discussion meetings with stakeholders, and workshop organisation with solid waste management experts. Weightage were assigned to the established indicators by using analytical hierarchy process, which were then incorporated into a performance index, known as municipal solid waste management performance index (MSWMPI). Data collection were done at five cities, which were Muar, Rembau, Putrajaya, Langkawi and Pekan. As a result, a total of nine indicators under four criteria, C1 (MSW Generation and Segregation), C2 (MSW Collection and Transportation), C3 (MSW Treatment) and C4 (MSW Disposal), were finalised. The weightage for the four criteria were found to be 32.17% for C1, 19.82% for C2, 25.41% for C3, and 22.60% for C4. Among the five cities, Pekan had the highest MSWMPI, with a value of 74.85 and was rated as performing good. On the other hand, the MSW management system in Muar had the lowest MSWMPI, with a value of 51.23. Langkawi had an MSWMPI of 59.89, which was followed behind closely by Rembau (58.12) and finally, Putrajaya had the MSWMPI value of 52.43. City profiling among the respective cities had also been done to identify the hotspots in the MSW management system. It was found that most cities performing well in C1 and C2, would not perform greatly in C3 and C4, and vice versa

Universiti Teknologi Malaysia Institutional Repository

A computationally efficient Branch-and-Bound algorithm for the permutation flow-shop scheduling problem

Author: Gmys Jan
Melab Nouredine
Mezmaz Mohand
Tuyttens Daniel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

International audienceIn this work we propose an efficient branch-and-bound (B&B) algorithm for the permutation flow-shop problem (PFSP) with makespan objective. We present a new node decomposition scheme that combines dynamic branching and lower bound refinement strategies in a computationally efficient way. To alleviate the computational burden of the two-machine bound used in the refinement stage, we propose an online learning-inspired mechanism to predict promising couples of bottleneck machines. The algorithm offers multiple choices for branching and bounding operators and can explore the search tree either sequentially or in parallel on multi-core CPUs. In order to empirically determine the most efficient combination of these components, a series of computational experiments with 600 benchmark instances is performed. A main insight is that the problem size, aswell as interactions between branching and bounding operators substantially modify the trade-off between the computational requirements of a lower bound and the achieved tree size reduction. Moreover, we demonstrate that parallel tree search is a key ingredient for the resolution of largeproblem instances, as strong super-linear speedups can be observed. An overall evaluation using two well-known benchmarks indicates that the proposed approach is superior to previously published B&B algorithms. For the first benchmark we report the exact resolution – within less than20 minutes – of two instances defined by 500 jobs and 20 machines that remained open for more than 25 years, and for the second a total of 89 improved best-known upper bounds, including proofs of optimality for 74 of them

INRIA a CCSD electronic archive server

Proceedings of the Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016) Sofia, Bulgaria

Author
Publication venue
Publication date: 01/12/2016
Field of study

Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Sofia (Bulgaria), October, 6-7, 2016

Universidad Carlos III de Madrid e-Archivo