Search CORE

29,271 research outputs found

Gunrock: A High-Performance Graph Processing Library on the GPU

Author: Cederman D.
Goel A.
Gonzalez J. E.
Gregor D.
Jia Y.
Low Y.
Pande P. R.
Siek J. G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/01/2016
Field of study

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five key graph primitives and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the previous version v5

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Analysis and improvement of data-set level file distribution in Disk Pool Manager

Author: Bhimji Wahid
Britton David
Mitchell Mark
Purdie Stuart
Skipsey Samuel
Smith David
Publication venue: 'IOP Publishing'
Publication date: 01/01/2014
Field of study

Of the three most widely used implementations of the WLCG Storage Element specification, Disk Pool Manager[1, 2] (DPM) has the simplest implementation of file placement balancing (StoRM doesn't attempt this, leaving it up to the underlying filesystem, which can be very sophisticated in itself). DPM uses a round-robin algorithm (with optional filesystem weighting), for placing files across filesystems and servers. This does a reasonable job of evenly distributing files across the storage array provided to it. However, it does not offer any guarantees of the evenness of distribution of that subset of files associated with a given "dataset" (which often maps onto a "directory" in the DPM namespace (DPNS)). It is useful to consider a concept of "balance", where an optimally balanced set of files indicates that the files are distributed evenly across all of the pool nodes. The best case performance of the round robin algorithm is to maintain balance, it has no mechanism to improve balance.<p></p> In the past year or more, larger DPM sites have noticed load spikes on individual disk servers, and suspected that these were exacerbated by excesses of files from popular datasets on those servers. We present here a software tool which analyses file distribution for all datasets in a DPM SE, providing a measure of the poorness of file location in this context. Further, the tool provides a list of file movement actions which will improve dataset-level file distribution, and can action those file movements itself. We present results of such an analysis on the UKI-SCOTGRID-GLASGOW Production DPM

Enlighten

CERN Document Server

Structure-Aware Dynamic Scheduler for Parallel Machine Learning

Author: Gibson Garth A.
Ho Qirong
Kim Jin Kyu
Lee Seunghak
Xing Eric P.
Publication venue
Publication date: 30/12/2013
Field of study

Training large machine learning (ML) models with many variables or parameters can take a long time if one employs sequential procedures even with stochastic updates. A natural solution is to turn to distributed computing on a cluster; however, naive, unstructured parallelization of ML algorithms does not usually lead to a proportional speedup and can even result in divergence, because dependencies between model elements can attenuate the computational gains from parallelization and compromise correctness of inference. Recent efforts toward this issue have benefited from exploiting the static, a priori block structures residing in ML algorithms. In this paper, we take this path further by exploring the dynamic block structures and workloads therein present during ML program execution, which offers new opportunities for improving convergence, correctness, and load balancing in distributed ML. We propose and showcase a general-purpose scheduler, STRADS, for coordinating distributed updates in ML algorithms, which harnesses the aforementioned opportunities in a systematic way. We provide theoretical guarantees for our scheduler, and demonstrate its efficacy versus static block structures on Lasso and Matrix Factorization

arXiv.org e-Print Archive

CiteSeerX

A Discussion on Fall Detection Issues and Its Deployment: When cloud meets battery

Author: Cal Marín Enrique Antonio de la
González Suárez Víctor Manuel
Khojasteh S. B.
Kiadi M.
Tan Qing
Villar Flecha José Ramón
Publication venue
Publication date: 01/01/2018
Field of study

IEEE International Conference on Cloud Computing and Big Data Analysis (3rd. 2018., Chengdu, China

Crossref

Repositorio Institucional de la Universidad de Oviedo

Fall Detection Analysis Using a Real Fall Dataset

Author: A Bourke
A Hakim
AM Sabatini
E Casilari
F Bianchi
F Wu
José Ramón Villar
JR Villar
M Daher
M Kangas
NV Chawla
P Kumari
PM Vergara
QT Huynh
R Igual
R Igual
S Abbate
S González
S Zhang
YC Fang
YC Fang
YS Delahoz
Publication venue
Publication date
Field of study

International Conference on Soft Computing Models in Industrial and Environmental Applications (13th. 2018. San Sebastián

Crossref

Repositorio Institucional de la Universidad de Oviedo