Search CORE

1,542 research outputs found

Good Intentions: Adaptive Parameter Management via Intent Signaling

Author: Gemulla Rainer
Gericke Robert
Kaoudi Zoi
Kieslinger Andreas
Markl Volker
Renz-Wieland Alexander
Publication venue
Publication date: 17/08/2023
Field of study

Parameter management is essential for distributed training of large machine learning (ML) tasks. Some ML tasks are hard to distribute because common approaches to parameter management can be highly inefficient. Advanced parameter management approaches -- such as selective replication or dynamic parameter allocation -- can improve efficiency, but to do so, they typically need to be integrated manually into each task's implementation and they require expensive upfront experimentation to tune correctly. In this work, we explore whether these two problems can be avoided. We first propose a novel intent signaling mechanism that integrates naturally into existing ML stacks and provides the parameter manager with crucial information about parameter accesses. We then describe AdaPM, a fully adaptive, zero-tuning parameter manager based on this mechanism. In contrast to prior systems, this approach separates providing information (simple, done by the task) from exploiting it effectively (hard, done automatically by AdaPM). In our experimental evaluation, AdaPM matched or outperformed state-of-the-art parameter managers out of the box, suggesting that automatic parameter management is possible

arXiv.org e-Print Archive

HaTS: Hardware-Assisted Transaction Scheduler

Author: Chen Zhanhao
Hassan Ahmed
Kishi Masoomeh Javidi
Nelson Jacob
Palmieri Roberto
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Conference on Principles of Distributed Systems (OPODIS 2019)
Publication date: 01/01/2020
Field of study

In this paper we present HaTS, a Hardware-assisted Transaction Scheduler. HaTS improves performance of concurrent applications by classifying the executions of their atomic blocks (or in-memory transactions) into scheduling queues, according to their so called conflict indicators. The goal is to group those transactions that are conflicting while letting non-conflicting transactions proceed in parallel. Two core innovations characterize HaTS. First, HaTS does not assume the availability of precise information associated with incoming transactions in order to proceed with the classification. It relaxes this assumption by exploiting the inherent conflict resolution provided by Hardware Transactional Memory (HTM). Second, HaTS dynamically adjusts the number of the scheduling queues in order to capture the actual application contention level. Performance results using the STAMP benchmark suite show up to 2x improvement over state-of-the-art HTM-based scheduling techniques

Dagstuhl Research Online Publication Server

Applying AI Techniques to Program Optimization for Parallel Computers

Author: Gannon Dennis
Wang Ko-Yang
Publication venue: 'Purdue University (bepress)'
Publication date: 20/03/1987
Field of study

Purdue E-Pubs

An Empirical Study of Speculative Concurrency in Ethereum Smart Contracts

Author: Herlihy Maurice
Saraph Vikram
Publication venue: OASIcs - OpenAccess Series in Informatics. International Conference on Blockchain Economics, Security and Protocols (Tokenomics 2019)
Publication date: 21/01/2019
Field of study

We use historical data to estimate the potential benefit of speculative techniques for executing Ethereum smart contracts in parallel. We replay transaction traces of sampled blocks from the Ethereum blockchain over time, using a simple speculative execution engine. In this engine, miners attempt to execute all transactions in a block in parallel, rolling back those that cause data conflicts. Aborted transactions are then executed sequentially. Validators execute the same schedule as miners. We find that our speculative technique yields estimated speed-ups starting at about 8-fold in 2016, declining to about 2-fold at the end of 2017, where speed-up is measured using either gas costs or instruction counts. We also observe that a small set of contracts are responsible for many data conflicts resulting from speculative concurrent execution

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

DSM64: A Distributed Shared Memory System in User-Space

Author: Holsapple Stephen Alan
Publication venue: DigitalCommons@CalPoly
Publication date: 01/05/2012
Field of study

This paper presents DSM64: a lazy release consistent software distributed shared memory (SDSM) system built entirely in user-space. The DSM64 system is capable of executing threaded applications implemented with pthreads on a cluster of networked machines without any modifications to the target application. The DSM64 system features a centralized memory manager [1] built atop Hoard [2, 3]: a fast, scalable, and memory-efficient allocator for shared-memory multiprocessors. In my presentation, I present a SDSM system written in C++ for Linux operating systems. I discuss a straight-forward approach to implement SDSM systems in a Linux environment using system-provided tools and concepts avail- able entirely in user-space. I show that the SDSM system presented in this paper is capable of resolving page faults over a local area network in as little as 2 milliseconds. In my analysis, I present the following. I compare the performance characteristics of a matrix multiplication benchmark using various memory coherency models. I demonstrate that matrix multiplication benchmark using a LRC model performs orders of magnitude quicker than the same application using a stricter coherency model. I show the effect of coherency model on memory access patterns and memory contention. I compare the effects of different locking strategies on execution speed and memory access patterns. Lastly, I provide a comparison of the DSM64 system to a non-networked version using a system-provided allocator

DigitalCommons@CalPoly

Quantum Monte Carlo for large chemical systems: Implementing efficient strategies for petascale platforms and beyond

Author: Caffarel Michel
Jalby William
Oseret Emmanuel
Scemama Anthony
Publication venue: 'Wiley'
Publication date: 01/10/2012
Field of study

Various strategies to implement efficiently QMC simulations for large chemical systems are presented. These include: i.) the introduction of an efficient algorithm to calculate the computationally expensive Slater matrices. This novel scheme is based on the use of the highly localized character of atomic Gaussian basis functions (not the molecular orbitals as usually done), ii.) the possibility of keeping the memory footprint minimal, iii.) the important enhancement of single-core performance when efficient optimization tools are employed, and iv.) the definition of a universal, dynamic, fault-tolerant, and load-balanced computational framework adapted to all kinds of computational platforms (massively parallel machines, clusters, or distributed grids). These strategies have been implemented in the QMC=Chem code developed at Toulouse and illustrated with numerical applications on small peptides of increasing sizes (158, 434, 1056 and 1731 electrons). Using 10k-80k computing cores of the Curie machine (GENCI-TGCC-CEA, France) QMC=Chem has been shown to be capable of running at the petascale level, thus demonstrating that for this machine a large part of the peak performance can be achieved. Implementation of large-scale QMC simulations for future exascale platforms with a comparable level of efficiency is expected to be feasible

arXiv.org e-Print Archive

HAL-INSA Toulouse

HAL UVSQ