Search CORE

119,073 research outputs found

Improvements in Hardware Transactional Memory for GPU Architectures

Author: Asenjo-Plaza Rafael
Navarro Ángeles
Plata-Gonzalez Oscar Guillermo
Villegas Alejandro
Publication venue
Publication date: 20/07/2016
Field of study

In the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based programming for thread synchronization. Recent research proposes the use of TM in GPU architectures, where a high number of computing threads, organized in SIMT fashion, requires an effective synchronization method. In contrast to CPUs, GPUs offer two memory spaces: global memory and local memory. The local memory space serves as a shared scratch-pad for a subset of the computing threads, and it is used by programmers to speed-up their applications thanks to its low latency. Prior work from the authors proposed a lightweight hardware TM (HTM) support based in the local memory, modifying the SIMT execution model and adding a conflict detection mechanism. An efficient implementation of these features is key in order to provide an effective synchronization mechanism at the local memory level. After a quick description of the main features of our HTM design for GPU local memory, in this work we gather together a number of proposals designed with the aim of improving those mechanisms with high impact on performance. Firstly, the SIMT execution model is modified to increase the parallelism of the application when transactions must be serialized in order to make forward progress. Secondly, the conflict detection mechanism is optimized depending on application characteristics, such us the read/write sets, the probability of conflict between transactions and the existence of read-only transactions. As these features can be present in hardware simultaneously, it is a task of the compiler and runtime to determine which ones are more important for a given application. This work includes a discussion on the analysis to be done in order to choose the best configuration solution.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Repositorio Institucional Universidad de Málaga

Synchronizing multivariate financial time series

Author: Audrino Francesco
Bühlmann Peter
Publication venue
Publication date: 20/10/2005
Field of study

Prices or returns of ﬁnancial assets are most often collected in local times of the trading markets. The need to synchronize multivariate time series of ﬁnancial prices or returns is motivated by the fact that information continues to ﬂow for closed markets while others are still open. We propose here a synchronization technique which takes this into account. Besides the nice interpretation of synchronization, the method potentially increases the predictive performance of any reasonable model and is more appropriate for the calculation of portfolio risk measures such as for example the expected shortfall. We found empirically that this was the case for the CCC-GARCH(1,1) model for a 7-dimensional time series of daily exchange rate returns. Since multivariate analysis is generally important for analyzing time-changing portfolios and for better portfolio predictions (even when portfolio weights are time-constant), synchronization is a valuable technique for a variety of problems with multivariate ﬁnancial data

RERO DOC Digital Library

A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures

Author: D Chakrabarti
H Cougny De
MM Strout
MR Garey
MT Jones
ÜV Çatalyürek
Publication venue
Publication date: 18/05/2015
Field of study

Irregular computations on unstructured data are an important class of problems for parallel programming. Graph coloring is often an important preprocessing step, e.g. as a way to perform dependency analysis for safe parallel execution. The total run time of a coloring algorithm adds to the overall parallel overhead of the application whereas the number of colors used determines the amount of exposed parallelism. A fast and scalable coloring algorithm using as few colors as possible is vital for the overall parallel performance and scalability of many irregular applications that depend upon runtime dependency analysis. Catalyurek et al. have proposed a graph coloring algorithm which relies on speculative, local assignment of colors. In this paper we present an improved version which runs even more optimistically with less thread synchronization and reduced number of conflicts compared to Catalyurek et al.'s algorithm. We show that the new technique scales better on multi-core and many-core systems and performs up to 1.5x faster than its predecessor on graphs with high-degree vertices, while keeping the number of colors at the same near-optimal levels.Comment: To appear in the proceedings of Euro Par 201

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Static local coordination avoidance for distributed objects

Author: Soethout T.M. (Tim)
Storm T. (Tijs) van der
Vinju J.J. (Jurgen)
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

In high-throughput, distributed systems, such as large-scale banking infrastructure, synchronization between actors becomes a bottle-neck in high-contention scenarios. This results in delays for users, and reduces opportunities for scaling such systems. This paper proposes Static Local Coordination Avoidance, which analyzes application invariants at compile time to detect whether messages are independent, so that synchronization at run time is avoided, and parallelism is increased. Analysis shows that in industry scenarios up to 60% of operations are independent. Initial performance evaluation shows that, in comparison to a standard 2-phase commit baseline, throughput is increased, and latency is reduced. As a result, scalability bottlenecks in high-contention scenarios in distributed actor systems are reduced for independent messages

Crossref

Proceedings - University of Groningen

University of Groningen

CWI's Institutional Repository

ARTS repository - University of Groningen

Dissertations of the University of Groningen

STSyn: Speeding Up Local SGD with Straggler-Tolerant Synchronization

Author: Wang Xin
Zhang Jingjing
Zhu Feng
Publication venue
Publication date: 29/05/2023
Field of study

Synchronous local stochastic gradient descent (local SGD) suffers from some workers being idle and random delays due to slow and straggling workers, as it waits for the workers to complete the same amount of local updates. In this paper, to mitigate stragglers and improve communication efficiency, a novel local SGD strategy, named STSyn, is developed. The key point is to wait for the

K

fastest workers, while keeping all the workers computing continually at each synchronization round, and making full use of any effective (completed) local update of each worker regardless of stragglers. An analysis of the average wall-clock time, average number of local updates and average number of uploading workers per round is provided to gauge the performance of STSyn. The convergence of STSyn is also rigorously established even when the objective function is nonconvex. Experimental results show the superiority of the proposed STSyn against state-of-the-art schemes through utilization of the straggler-tolerant technique and additional effective local updates at each worker, and the influence of system parameters is studied. By waiting for faster workers and allowing heterogeneous synchronization with different numbers of local updates across workers, STSyn provides substantial improvements both in time and communication efficiency.Comment: 12 pages, 10 figures, submitted for transaction publicatio

arXiv.org e-Print Archive