Search CORE

3 research outputs found

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

Author: Brocard Sylvan
Cimadomo Remy
Guo Yuxin
Gómez-Luna Juan
Legriel Julien
Mutlu Onur
Oliveira Geraldo F.
Singh Gagandeep
Publication venue
Publication date: 20/04/2023
Field of study

Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate ML training. To do so, we (1) implement several representative classic ML algorithms (namely, linear regression, logistic regression, decision tree, K-Means clustering) on a real-world general-purpose PIM architecture, (2) rigorously evaluate and characterize them in terms of accuracy, performance and scaling, and (3) compare to their counterpart implementations on CPU and GPU. Our evaluation on a real memory-centric computing system with more than 2500 PIM cores shows that general-purpose PIM architectures can greatly accelerate memory-bound ML workloads, when the necessary operations and datatypes are natively supported by PIM hardware. For example, our PIM implementation of decision tree is

27\times

faster than a state-of-the-art CPU version on an 8-core Intel Xeon, and

1.34\times

faster than a state-of-the-art GPU version on an NVIDIA A100. Our K-Means clustering on PIM is

2.8\times

and

3.2\times

than state-of-the-art CPU and GPU versions, respectively. To our knowledge, our work is the first one to evaluate ML training on a real-world PIM architecture. We conclude with key observations, takeaways, and recommendations that can inspire users of ML workloads, programmers of PIM architectures, and hardware designers & architects of future memory-centric computing systems

arXiv.org e-Print Archive

Variant Calling Parallelization on Processor-in-Memory Architecture

Author: Cimadomo Remy
Jodin Romaric
Lavenier Dominique
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/12/2020
Field of study

International audienceThis paper introduces a new combination of software and hardware PIM (Process-in-Memory) architecture to accelerate the variant calling genomic process. PIM translates into bringing data intensive calculations directly where the data is: within the DRAM, enhanced with thousands of processing units. The energy consumption, in large part due to data movement, is significantly lowered at a marginal additional hardware cost. Such design allows an unprecedented level of parallelism to process billions of short reads. Experiments on real PIM devices developed by the UPMEM company show significant speed-up compared to pure software implementation. The PIM solution also compared nicely to FPGA or GPU based acceleration bringing similar to twice the processing speed but most importantly being 5 to 8 times cheaper to deploy with up to 6 times less power consumption

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

Author: Brocard Sylvan
Cimadomo Remy
Guo Yuxin
Gómez Luna Juan
Legriel Julien
Mutlu Onur
Oliveira Geraldo F.
Singh Gagandeep
Publication venue: Cornell University
Publication date: 16/07/2022
Field of study

27\times

faster than a state-of-the-art CPU version on an 8-core Intel Xeon, and

1.34\times

faster than a state-of-the-art GPU version on an NVIDIA A100. Our K-Means clustering on PIM is

2.8\times

and

3.2\times

Repository for Publications and Research Data