198 research outputs found
Remote-scope Promotion: Clarified, Rectified, and Verified
Modern accelerator programming frameworks, such as OpenCL, organise threads into work-groups. Remote-scope promotion (RSP) is a language extension recently proposed by AMD researchers that is designed to enable applications, for the first time, both to optimise for the common case of intra-work-group communication (using memory scopes to provide consistency only within a work-group) and to allow occasional inter-work-group communication (as required, for instance, to support the popular load-balancing idiom of work stealing). We present the first formal, axiomatic memory model of OpenCL extended with RSP. We have extended the Herd memory model simulator with support for OpenCL kernels that exploit RSP, and used it to discover bugs in several litmus tests and a work-stealing queue, that have been used previously in the study of RSP. We have also formalised the proposed GPU implementation of RSP. The formalisation process allowed us to identify bugs in the description of RSP that could result in well-synchronised programs experiencing memory inconsistencies. We present and prove sound a new implementation of RSP that incorporates bug fixes and requires less non-standard hardware than the original implementation. This work, a collaboration between academia and industry, clearly demonstrates how, when designing hardware support for a new concurrent language feature, the early application of formal tools and techniques can help to prevent errors, such as those we have found, from making it into silicon
GPU Concurrency: Weak Behaviours and Programming Assumptions
Concurrency is pervasive and perplexing, particularly on graphics processing units (GPUs). Current specifications of languages and hardware are inconclusive; thus programmers often rely on folklore assumptions when writing software.
To remedy this state of affairs, we conducted a large empirical study of the concurrent behaviour of deployed GPUs. Armed with litmus tests (i.e. short concurrent programs), we questioned the assumptions in programming guides and vendor documentation about the guarantees provided by hardware. We developed a tool to generate thousands of litmus tests and run them under stressful workloads. We observed a litany of previously elusive weak behaviours, and exposed folklore beliefs about GPU programming---often supported by official tutorials---as false.
As a way forward, we propose a model of Nvidia GPU hardware, which correctly models every behaviour witnessed in our experiments. The model is a variant of SPARC Relaxed Memory Order (RMO), structured following the GPU concurrency hierarchy
Gunrock: A High-Performance Graph Processing Library on the GPU
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs have been two
significant challenges for developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We evaluate Gunrock on five key graph
primitives and show that Gunrock has on average at least an order of magnitude
speedup over Boost and PowerGraph, comparable performance to the fastest GPU
hardwired primitives, and better performance than any other GPU high-level
graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the
previous version v5
Exploratory of society
A huge flow of quantitative social, demographic and behavioral data is becoming available that traces the activities and interactions of individuals, social patterns, transportation infrastructures and travel fluxes. This has caused, together with innovative computational techniques and methods for modeling social actions in hybrid (natural and artificial) societies, a qualitative change in the ways we model socio-technical systems. For the first time, society can be studied in a comprehensive fashion that addresses social and behavioral complexity. In other words we are in the position to envision the development of large data and computational cyber infrastructure defining an exploratory of society that provides quantitative anticipatory, explanatory and scenario analysis capabilities ranging from emerging infectious disease to conflict and crime surges. The goal of the exploratory of society is to provide the basic infrastructure embedding the framework of tools and knowledge needed for the design of forecast/anticipatory/crisis management approaches to socio technical systems, supporting future decision making procedures by accelerating the scientific cycle that goes from data generation to predictions. Graphical abstrac
Introduction to Special Issue on “Disaggregating Civil War”
We introduce the contributions to this special issue on “Disaggregating Civil War.” We review the problems arising from excessive aggregation in studies of civil war, and outline how disaggregation promises to provide better insights into the causes and dynamics of civil wars, using the articles in this special issue as examples. We comment on the issue of the appropriate level of disaggregation, lessons learned from these articles, and issues for further research. </jats:p
Robust artificial neural networks and outlier detection. Technical report
Large outliers break down linear and nonlinear regression models. Robust
regression methods allow one to filter out the outliers when building a model.
By replacing the traditional least squares criterion with the least trimmed
squares criterion, in which half of data is treated as potential outliers, one
can fit accurate regression models to strongly contaminated data.
High-breakdown methods have become very well established in linear regression,
but have started being applied for non-linear regression only recently. In this
work, we examine the problem of fitting artificial neural networks to
contaminated data using least trimmed squares criterion. We introduce a
penalized least trimmed squares criterion which prevents unnecessary removal of
valid data. Training of ANNs leads to a challenging non-smooth global
optimization problem. We compare the efficiency of several derivative-free
optimization methods in solving it, and show that our approach identifies the
outliers correctly when ANNs are used for nonlinear regression
Cooperative kernels: GPU multitasking for blocking algorithms
There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking , so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploit- ing scheduling quirks of today’s GPUs in a manner that does not allow the GPU to be shared with other workloads (such as graphics rendering tasks). We propose cooperative kernels , an extension to the traditional GPU programming model geared towards writing blocking algorithms. Workgroups of a cooperative kernel are fairly scheduled, and multitasking is supported via a small set of language extensions through which the kernel and scheduler cooperate. We describe a prototype implementation of a cooperative kernel frame- work implemented in OpenCL 2.0 and evaluate our approach by porting a set of blocking GPU applications to cooperative kernels and examining their performance under multitasking
A Universal Model of Global Civil Unrest
Civil unrest is a powerful form of collective human dynamics, which has led
to major transitions of societies in modern history. The study of collective
human dynamics, including collective aggression, has been the focus of much
discussion in the context of modeling and identification of universal patterns
of behavior. In contrast, the possibility that civil unrest activities, across
countries and over long time periods, are governed by universal mechanisms has
not been explored. Here, we analyze records of civil unrest of 170 countries
during the period 1919-2008. We demonstrate that the distributions of the
number of unrest events per year are robustly reproduced by a nonlinear,
spatially extended dynamical model, which reflects the spread of civil disorder
between geographic regions connected through social and communication networks.
The results also expose the similarity between global social instability and
the dynamics of natural hazards and epidemics.Comment: 8 pages, 3 figure
Portable Inter-workgroup Barrier Synchronisation for GPUs
Despite the growing popularity of GPGPU programming, there is not yet a portable and formally-specified barrier that one can use to synchronise across workgroups. Moreover, the occupancy-bound execution model of GPUs breaks assumptions inherent in traditional software execution barriers, exposing them to deadlock. We present an occupancy discovery protocol that dynamically discovers a safe estimate of the occupancy for a given GPU and kernel, allowing for a starvation-free (and hence, deadlock-free) inter-workgroup barrier by restricting the number of workgroups according to this estimate. We implement this idea by adapting an existing, previously non-portable, GPU inter-workgroup barrier to use OpenCL 2.0 atomic operations, and prove that the barrier meets its natural specification in terms of synchronisation.
We assess the portability of our approach over eight GPUs spanning four vendors, comparing the performance of our method against alternative methods. Our key findings include: (1) the recall of our discovery protocol is nearly 100%; (2) runtime comparisons vary substantially across GPUs and applications; and (3) our method provides portable and safe inter-workgroup synchronisation across the applications we study
Polarization of coalitions in an agent-based model of political discourse
Political discourse is the verbal interaction between political actors in a policy domain. This article explains the formation of polarized advocacy or discourse coalitions in this complex phenomenon by presenting a dynamic, stochastic, and discrete agent-based model based on graph theory and local optimization. In a series of thought experiments, actors compute their utility of contributing a specific statement to the discourse by following ideological criteria, preferential attachment, agenda-setting strategies, governmental coherence, or other mechanisms. The evolving macro-level discourse is represented as a dynamic network and evaluated against arguments from the literature on the policy process. A simple combination of four theoretical mechanisms is already able to produce artificial policy debates with theoretically plausible properties. Any sufficiently realistic configuration must entail innovative and path-dependent elements as well as a blend of exogenous preferences and endogenous opinion formation mechanisms
- …