198 research outputs found

    Remote-scope Promotion: Clarified, Rectified, and Verified

    Get PDF
    Modern accelerator programming frameworks, such as OpenCL, organise threads into work-groups. Remote-scope promotion (RSP) is a language extension recently proposed by AMD researchers that is designed to enable applications, for the first time, both to optimise for the common case of intra-work-group communication (using memory scopes to provide consistency only within a work-group) and to allow occasional inter-work-group communication (as required, for instance, to support the popular load-balancing idiom of work stealing). We present the first formal, axiomatic memory model of OpenCL extended with RSP. We have extended the Herd memory model simulator with support for OpenCL kernels that exploit RSP, and used it to discover bugs in several litmus tests and a work-stealing queue, that have been used previously in the study of RSP. We have also formalised the proposed GPU implementation of RSP. The formalisation process allowed us to identify bugs in the description of RSP that could result in well-synchronised programs experiencing memory inconsistencies. We present and prove sound a new implementation of RSP that incorporates bug fixes and requires less non-standard hardware than the original implementation. This work, a collaboration between academia and industry, clearly demonstrates how, when designing hardware support for a new concurrent language feature, the early application of formal tools and techniques can help to prevent errors, such as those we have found, from making it into silicon

    GPU Concurrency: Weak Behaviours and Programming Assumptions

    Get PDF
    Concurrency is pervasive and perplexing, particularly on graphics processing units (GPUs). Current specifications of languages and hardware are inconclusive; thus programmers often rely on folklore assumptions when writing software. To remedy this state of affairs, we conducted a large empirical study of the concurrent behaviour of deployed GPUs. Armed with litmus tests (i.e. short concurrent programs), we questioned the assumptions in programming guides and vendor documentation about the guarantees provided by hardware. We developed a tool to generate thousands of litmus tests and run them under stressful workloads. We observed a litany of previously elusive weak behaviours, and exposed folklore beliefs about GPU programming---often supported by official tutorials---as false. As a way forward, we propose a model of Nvidia GPU hardware, which correctly models every behaviour witnessed in our experiments. The model is a variant of SPARC Relaxed Memory Order (RMO), structured following the GPU concurrency hierarchy

    Gunrock: A High-Performance Graph Processing Library on the GPU

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five key graph primitives and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the previous version v5

    Exploratory of society

    Get PDF
    A huge flow of quantitative social, demographic and behavioral data is becoming available that traces the activities and interactions of individuals, social patterns, transportation infrastructures and travel fluxes. This has caused, together with innovative computational techniques and methods for modeling social actions in hybrid (natural and artificial) societies, a qualitative change in the ways we model socio-technical systems. For the first time, society can be studied in a comprehensive fashion that addresses social and behavioral complexity. In other words we are in the position to envision the development of large data and computational cyber infrastructure defining an exploratory of society that provides quantitative anticipatory, explanatory and scenario analysis capabilities ranging from emerging infectious disease to conflict and crime surges. The goal of the exploratory of society is to provide the basic infrastructure embedding the framework of tools and knowledge needed for the design of forecast/anticipatory/crisis management approaches to socio technical systems, supporting future decision making procedures by accelerating the scientific cycle that goes from data generation to predictions. Graphical abstrac

    Introduction to Special Issue on “Disaggregating Civil War”

    Get PDF
    We introduce the contributions to this special issue on “Disaggregating Civil War.” We review the problems arising from excessive aggregation in studies of civil war, and outline how disaggregation promises to provide better insights into the causes and dynamics of civil wars, using the articles in this special issue as examples. We comment on the issue of the appropriate level of disaggregation, lessons learned from these articles, and issues for further research. </jats:p

    Robust artificial neural networks and outlier detection. Technical report

    Get PDF
    Large outliers break down linear and nonlinear regression models. Robust regression methods allow one to filter out the outliers when building a model. By replacing the traditional least squares criterion with the least trimmed squares criterion, in which half of data is treated as potential outliers, one can fit accurate regression models to strongly contaminated data. High-breakdown methods have become very well established in linear regression, but have started being applied for non-linear regression only recently. In this work, we examine the problem of fitting artificial neural networks to contaminated data using least trimmed squares criterion. We introduce a penalized least trimmed squares criterion which prevents unnecessary removal of valid data. Training of ANNs leads to a challenging non-smooth global optimization problem. We compare the efficiency of several derivative-free optimization methods in solving it, and show that our approach identifies the outliers correctly when ANNs are used for nonlinear regression

    Cooperative kernels: GPU multitasking for blocking algorithms

    Get PDF
    There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking , so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploit- ing scheduling quirks of today’s GPUs in a manner that does not allow the GPU to be shared with other workloads (such as graphics rendering tasks). We propose cooperative kernels , an extension to the traditional GPU programming model geared towards writing blocking algorithms. Workgroups of a cooperative kernel are fairly scheduled, and multitasking is supported via a small set of language extensions through which the kernel and scheduler cooperate. We describe a prototype implementation of a cooperative kernel frame- work implemented in OpenCL 2.0 and evaluate our approach by porting a set of blocking GPU applications to cooperative kernels and examining their performance under multitasking

    A Universal Model of Global Civil Unrest

    Get PDF
    Civil unrest is a powerful form of collective human dynamics, which has led to major transitions of societies in modern history. The study of collective human dynamics, including collective aggression, has been the focus of much discussion in the context of modeling and identification of universal patterns of behavior. In contrast, the possibility that civil unrest activities, across countries and over long time periods, are governed by universal mechanisms has not been explored. Here, we analyze records of civil unrest of 170 countries during the period 1919-2008. We demonstrate that the distributions of the number of unrest events per year are robustly reproduced by a nonlinear, spatially extended dynamical model, which reflects the spread of civil disorder between geographic regions connected through social and communication networks. The results also expose the similarity between global social instability and the dynamics of natural hazards and epidemics.Comment: 8 pages, 3 figure

    Portable Inter-workgroup Barrier Synchronisation for GPUs

    Get PDF
    Despite the growing popularity of GPGPU programming, there is not yet a portable and formally-specified barrier that one can use to synchronise across workgroups. Moreover, the occupancy-bound execution model of GPUs breaks assumptions inherent in traditional software execution barriers, exposing them to deadlock. We present an occupancy discovery protocol that dynamically discovers a safe estimate of the occupancy for a given GPU and kernel, allowing for a starvation-free (and hence, deadlock-free) inter-workgroup barrier by restricting the number of workgroups according to this estimate. We implement this idea by adapting an existing, previously non-portable, GPU inter-workgroup barrier to use OpenCL 2.0 atomic operations, and prove that the barrier meets its natural specification in terms of synchronisation. We assess the portability of our approach over eight GPUs spanning four vendors, comparing the performance of our method against alternative methods. Our key findings include: (1) the recall of our discovery protocol is nearly 100%; (2) runtime comparisons vary substantially across GPUs and applications; and (3) our method provides portable and safe inter-workgroup synchronisation across the applications we study

    Polarization of coalitions in an agent-based model of political discourse

    Get PDF
    Political discourse is the verbal interaction between political actors in a policy domain. This article explains the formation of polarized advocacy or discourse coalitions in this complex phenomenon by presenting a dynamic, stochastic, and discrete agent-based model based on graph theory and local optimization. In a series of thought experiments, actors compute their utility of contributing a specific statement to the discourse by following ideological criteria, preferential attachment, agenda-setting strategies, governmental coherence, or other mechanisms. The evolving macro-level discourse is represented as a dynamic network and evaluated against arguments from the literature on the policy process. A simple combination of four theoretical mechanisms is already able to produce artificial policy debates with theoretically plausible properties. Any sufficiently realistic configuration must entail innovative and path-dependent elements as well as a blend of exogenous preferences and endogenous opinion formation mechanisms
    corecore