Search CORE

3 research outputs found

Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines

Author: Gidra Lokesh
Nguyen Nhan
Shapiro Marc
Sopena Julien
Thomas Gaël
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/03/2015
Field of study

International audienceOn contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. We address this problem with NumaGiC, a GC with a mostly-distributed design. In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism. We compare NumaGiC with Parallel Scavenge and NAPS on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160 GB to 350 GB, and on SPECjbb2013 and SPECjbb2005, Numa-GiC improves overall performance by up to 45% over NAPS (up to 94% over Parallel Scavenge), and increases the performance of the collector itself by up to 3.6× over NAPS (up to 5.4× over Parallel Scavenge)

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Chalmers Publication Library

Hal-Diderot

An interface to implement NUMA policies in the Xen hypervisor

Author: Quema Vivien
Sens Pierre
Thomas Gaël
Voron Gauthier
Publication venue: HAL CCSD
Publication date: 23/04/2017
Field of study

International audienceWhile virtualization only introduces a small overhead on machines with few cores, this is not the case on larger ones. Most of the overhead on the latter machines is caused by the Non-Uniform Memory Access (NUMA) architecture they are using. In order to reduce this overhead, this paper shows how NUMA placement heuristics can be implemented inside Xen. With an evaluation of 29 applications on a 48-core machine, we show that the NUMA placement heuristics can multiply the performance of 9 applications by more than 2

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot