Search CORE

Kent Academic Repository

GPU Concurrency: Weak Behaviours and Programming Assumptions

Author: AMD.
AMD.
AMD.
Cederman D.
Cederman D.
Collier W.
Core ARM.
Feng W.
Hower D. R.
Hwu W.-m. W.
Khronos OpenCL Working Group
Sanders J.
Sorensen T.
Southern AMD.
Stuart J. A.
Weaver D. L.
Xiao S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/11/2014
Field of study

Concurrency is pervasive and perplexing, particularly on graphics processing units (GPUs). Current specifications of languages and hardware are inconclusive; thus programmers often rely on folklore assumptions when writing software. To remedy this state of affairs, we conducted a large empirical study of the concurrent behaviour of deployed GPUs. Armed with litmus tests (i.e. short concurrent programs), we questioned the assumptions in programming guides and vendor documentation about the guarantees provided by hardware. We developed a tool to generate thousands of litmus tests and run them under stressful workloads. We observed a litany of previously elusive weak behaviours, and exposed folklore beliefs about GPU programming---often supported by official tutorials---as false. As a way forward, we propose a model of Nvidia GPU hardware, which correctly models every behaviour witnessed in our experiments. The model is a variant of SPARC Relaxed Memory Order (RMO), structured following the GPU concurrency hierarchy

Oxford University Research Archive

Kent Academic Repository

Gunrock: A High-Performance Graph Processing Library on the GPU

Author: Cederman D.
Goel A.
Gonzalez J. E.
Gregor D.
Jia Y.
Low Y.
Pande P. R.
Siek J. G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/01/2016
Field of study

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five key graph primitives and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the previous version v5

eScholarship - University of California

Exploratory of society

Author: Cederman L.
Conte R.
Helbing D.
Nowak A.
Schweitzer F.
Vespignani A.
Publication venue
Publication date: 18/06/2018
Field of study

A huge flow of quantitative social, demographic and behavioral data is becoming available that traces the activities and interactions of individuals, social patterns, transportation infrastructures and travel fluxes. This has caused, together with innovative computational techniques and methods for modeling social actions in hybrid (natural and artificial) societies, a qualitative change in the ways we model socio-technical systems. For the first time, society can be studied in a comprehensive fashion that addresses social and behavioral complexity. In other words we are in the position to envision the development of large data and computational cyber infrastructure defining an exploratory of society that provides quantitative anticipatory, explanatory and scenario analysis capabilities ranging from emerging infectious disease to conflict and crime surges. The goal of the exploratory of society is to provide the basic infrastructure embedding the framework of tools and knowledge needed for the design of forecast/anticipatory/crisis management approaches to socio technical systems, supporting future decision making procedures by accelerating the scientific cycle that goes from data generation to predictions. Graphical abstrac

RERO DOC Digital Library

Introduction to Special Issue on “Disaggregating Civil War”

Author: Brancati Dawn.
Cederman Lars-Erik
Gurr Ted R.
Kristian Skrede Gleditsch
Lars-Erik Cederman
Most Benjamin A.
stby Gudrun.
Taylor Charles Lewis
Toft Monica D.
Zürcher Christoph.
Publication venue: 'SAGE Publications'
Publication date: 28/05/2009
Field of study

We introduce the contributions to this special issue on “Disaggregating Civil War.” We review the problems arising from excessive aggregation in studies of civil war, and outline how disaggregation promises to provide better insights into the causes and dynamics of civil wars, using the articles in this special issue as examples. We comment on the issue of the appropriate level of disaggregation, lessons learned from these articles, and issues for further research. </jats:p

University of Essex Research Repository

Robust artificial neural networks and outlier detection. Technical report

Author: Andrei Kelarev
Cederman D
Gleb Beliakov
Huber PJ
John Yearwood
Makela MM
Mammadov MA
Masters T
Powell MJD
Press AH
Rousseeuw PJ
Rusiecki A
Sengupta S
Smola AJ
Publication venue: 'Informa UK Limited'
Publication date: 02/10/2011
Field of study

Large outliers break down linear and nonlinear regression models. Robust regression methods allow one to filter out the outliers when building a model. By replacing the traditional least squares criterion with the least trimmed squares criterion, in which half of data is treated as potential outliers, one can fit accurate regression models to strongly contaminated data. High-breakdown methods have become very well established in linear regression, but have started being applied for non-linear regression only recently. In this work, we examine the problem of fitting artificial neural networks to contaminated data using least trimmed squares criterion. We introduce a penalized least trimmed squares criterion which prevents unnecessary removal of valid data. Training of ANNs leads to a challenging non-smooth global optimization problem. We compare the efficiency of several derivative-free optimization methods in solving it, and show that our approach identifies the outliers correctly when ANNs are used for nonlinear regression

Deakin Research Online

Federation ResearchOnline

Cooperative kernels: GPU multitasking for blocking algorithms

Author: Cederman D.
Herlihy M.
Howes L. W.
Kaleem R.
Sorensen T.
Tanasic I.
Tzeng S.
Xiao S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/07/2017
Field of study

There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking , so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploit- ing scheduling quirks of today’s GPUs in a manner that does not allow the GPU to be shared with other workloads (such as graphics rendering tasks). We propose cooperative kernels , an extension to the traditional GPU programming model geared towards writing blocking algorithms. Workgroups of a cooperative kernel are fairly scheduled, and multitasking is supported via a small set of language extensions through which the kernel and scheduler cooperate. We describe a prototype implementation of a cooperative kernel frame- work implemented in OpenCL 2.0 and evaluate our approach by porting a set of blocking GPU applications to cooperative kernels and examining their performance under multitasking

A Universal Model of Global Civil Unrest

Author: A Clauset
BD Malamud
C Clauset
CJ Rhodes
D McAdam
Dan Braha
DC Roberts
DD Zhang
DJ Watts
DM Mason
F Polletta
H Berger
HE Stanley
JC Bohorquez
JM Epstein
L-E Cederman
LF Richardson
M Biggs
M Brückner
M Lim
MB Burke
N Johnson
R Tanter
Yamir Moreno
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Civil unrest is a powerful form of collective human dynamics, which has led to major transitions of societies in modern history. The study of collective human dynamics, including collective aggression, has been the focus of much discussion in the context of modeling and identification of universal patterns of behavior. In contrast, the possibility that civil unrest activities, across countries and over long time periods, are governed by universal mechanisms has not been explored. Here, we analyze records of civil unrest of 170 countries during the period 1919-2008. We demonstrate that the distributions of the number of unrest events per year are robustly reproduced by a nonlinear, spatially extended dynamical model, which reflects the spread of civil disorder between geographic regions connected through social and communication networks. The results also expose the similarity between global social instability and the dynamics of natural hazards and epidemics.Comment: 8 pages, 3 figure

Public Library of Science (PLOS)

CiteSeerX

DSpace@MIT

Directory of Open Access Journals

PubMed Central

Portable Inter-workgroup Barrier Synchronisation for GPUs

Author: Alastair F. Donaldson
Cederman D.
Che S.
Ganesh Gopalakrishnan
Gupta K.
Herlihy M.
Mark Batty
Solihin Y.
Tyler Sorensen
Tzeng S.
Xiao S.
Zvonimir Rakamarić
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Despite the growing popularity of GPGPU programming, there is not yet a portable and formally-specified barrier that one can use to synchronise across workgroups. Moreover, the occupancy-bound execution model of GPUs breaks assumptions inherent in traditional software execution barriers, exposing them to deadlock. We present an occupancy discovery protocol that dynamically discovers a safe estimate of the occupancy for a given GPU and kernel, allowing for a starvation-free (and hence, deadlock-free) inter-workgroup barrier by restricting the number of workgroups according to this estimate. We implement this idea by adapting an existing, previously non-portable, GPU inter-workgroup barrier to use OpenCL 2.0 atomic operations, and prove that the barrier meets its natural specification in terms of synchronisation. We assess the portability of our approach over eight GPUs spanning four vendors, comparing the performance of our method against alternative methods. Our key findings include: (1) the recall of our discovery protocol is nearly 100%; (2) runtime comparisons vary substantially across GPUs and applications; and (3) our method provides portable and safe inter-workgroup synchronisation across the applications we study

Kent Academic Repository

KOPS - The Institutional Repository of the University of Konstanz

Polarization of coalitions in an agent-based model of political discourse

Author: A Downs
A Lijphart
AL Barabási
D Baldassarri
D Krackhardt
D Miller
DJ Watts
DR Fisher
DR Fisher
FR Baumgartner
G Wolfsfeld
I Lustick
J Lerner
JM Epstein
KM Ingold
KM Ingold
L Festinger
LC Freeman
LE Cederman
MA Hajer
MD Ward
MEJ Newman
MJ Hinich
N Fairclough
P Leifeld
P Leifeld
P Pierson
PA Hall
PA Sabatier
PE Johnson
R Bhavnani
RB Cialdini
RP Abelson
S Wasserman
SA Marvel
VA Traag
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Political discourse is the verbal interaction between political actors in a policy domain. This article explains the formation of polarized advocacy or discourse coalitions in this complex phenomenon by presenting a dynamic, stochastic, and discrete agent-based model based on graph theory and local optimization. In a series of thought experiments, actors compute their utility of contributing a specific statement to the discourse by following ideological criteria, preferential attachment, agenda-setting strategies, governmental coherence, or other mechanisms. The evolving macro-level discourse is represented as a dynamic network and evaluated against arguments from the literature on the policy process. A simple combination of four theoretical mechanisms is already able to produce artificial policy debates with theoretically plausible properties. Any sufficiently realistic configuration must entail innovative and path-dependent elements as well as a blend of exogenous preferences and endogenous opinion formation mechanisms