Search CORE

21 research outputs found

Effective Barrier Synchronization on Intel Xeon Phi Coprocessor

Author: Lujan Mikel
Nisbet Andy
Pop Antoniu
Rodchenko Andrey
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Crossref

The University of Manchester - Institutional Repository

On the nature of progress

Author: G. Taubenfeld
H. Attiya
J. Aspnes
J. Mellor-Crummey
L. Lamport
M. Herlihy
M. Herlihy
M.P. Herlihy
N. Lynch
S. Heller
T.L. Harris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

15th International Conference, OPODIS 2011, Toulouse, France, December 13-16, 2011. ProceedingsWe identify a simple relationship that unifies seemingly unrelated progress conditions ranging from the deadlock-free and starvation-free properties common to lock-based systems, to non-blocking conditions such as obstruction-freedom, lock-freedom, and wait-freedom. Properties can be classified along two dimensions based on the demands they make on the operating system scheduler. A gap in the classification reveals a new non-blocking progress condition, weaker than obstruction-freedom, which we call clash-freedom. The classification provides an intuitively-appealing explanation why programmers continue to devise data structures that mix both blocking and non-blocking progress conditions. It also explains why the wait-free property is a natural basis for the consensus hierarchy: a theory of shared-memory computation requires an independent progress condition, not one that makes demands of the operating system scheduler

CiteSeerX

DSpace@MIT

Crossref

Adversarial Analyses of Window Backoff Strategies for Simple Multiple-Access Channels

Author: Bender Michael A.
Farach-Colton Martin
He Simai
Kuszmaul Bradley C.
Leiserson Charles E.
Publication venue
Publication date: 01/01/2004
Field of study

Backoff strategies have typically been analyzed by making statistical assumptions on the distribution of problem inputs. Although these analyses have provided valuable insights into the efficacy of various backoff strategies, they leave open the question as to which backoff algorithms perform best in the worst case or on inputs, such as bursty inputs, that are not covered by the statistical models. This paper analyzes randomized backoff strategies using worst-case assumptions on the inputs. Specifically, we analyze algorithms for simple multiple-access channels, where the only feedback from each attempt to send a packet is a single bit indicating whether the transmission succeeded or the packet collided with another packet. We analyze a class of strategies, called window strategies, where each packet partitions time into a sequence (W₁, W₂,...) of windows. Within each window, the packet makes an access attempt during a single randomly selected slot. If its transmission is unsuccessful, it waits for its slot in the next window before retrying. We use delay-sequence arguments to show that for the batch problem, in which n packets all arrive at time 0, if every window has size W = Θ(n), then with high probability, all packets successfully transmit with makespan n lg lg n ± O(n). We use this result to analyze window backoff strategies with varying window sizes. Specifically, we show that the familiar binary exponential backoff algorithm, where Wk = Θ(2k), has makespan Θ(n lg n), and that more generally, for any constant r > 1, the r-exponential backoff algorithm in which Wk = Θ(rk) has makespan Θ(n lglg rn). We also show that for any constant r > 1, the r-polynomial backoff algorithm, in which Wk = Θ(kr), has makespan Θ((n/lg n)¹⁺¹/r). All of these batch strategies are monotonic, in the sense that the window size monotonically increases over time. We exhibit a monotonic backoff algorithm that achieves makespan Θ(n lg lg n/lg lg lg n). We prove that this algorithm, whose backoff is superpolynomial and subexponential, is optimal over all monotonic backoff schemes. In addition, we exhibit a simple backoff/backon algorithm, having window sizes that vary nonmonotonically according to a "sawtooth" pattern, that achieves the optimal makespan of Θ(n). We study the online setting using an adversarial queueing model. We define a (λ,T)-stream to be an input stream of packets in which at most n = λT packets arrive during any time interval of size T. In this model, to evaluate a given backoff algorithm (which does not know λ or T), we analyze the worst-case behavior of the algorithm over the class of (λ,T)-streams. Our results for the online setting focus on exponential backoff. We show that for any arrival rate λ, there exists a sufficiently large interval size T such that the throughput goes to 0 for some (λ,T)-stream. Moreover, there exists a sufficiently large constant c such that for any interval size T, if λ â¥ c lg lg n/lg n, the system is unstable in the sense that the arrival rate exceeds the throughput in the worst case. If, on the other hand, we have λ â¤ c/lg n for a sufficiently small constant c, then the system is stable. Surprisingly, the algorithms that guarantee smaller makespans in the batch setting require lower arrival rates to achieve stability than does exponential backoff, but when they are stable, they have better response times.Singapore-MIT Alliance (SMA

DSpace@MIT

Tools for Empirical and Operational Analysis of Mobile Offloading in Loop-Based Applications

Author: Alexandru-Corneliu OLTEANU
Nicolae TAPUS
Publication venue: 'ECO-INFOSOC Research Center'
Publication date: 01/01/2013
Field of study

Offloading for mobile devices is an increasingly popular research topic, matching the popu-larity mobile devices have in the general population. Studying mobile offloading is challenging because of device and application heterogeneity. However, we believe that focusing on a specific type of application can bring advances in offloading for mobile devices, while still keeping a wide range of applicability. In this paper we focus on loop-based applications, in which most of the functionality is given by iterating an execution loop. We model the main loop of the application with a graph that consists of a cycle and propose an operational analysis to study offloading on this model. We also propose a testbed based on a real-world application to empirically evaluate offloading. We conduct performance evaluation using both tools and compare the analytical and empirical results

Directory of Open Access Journals

Power-Aware Pipelining with Automatic Concurrency Control

Author: Aldinucci Marco
Danelutto Marco
Daniele De Sensi
Gabriele Mencagli
Massimo Torquati
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

Institutional Research Information System University of Turin

Directory Based Cache Coherency Protocols for Shared Memory Multiprocessors

Author: Warner Craig
Publication venue: 'Purdue University (bepress)'
Publication date: 01/05/1990
Field of study

Directory based cache coherency protocols can be used to build large scale, weakly ordered, shared memory multiprocessors. The salient feature of these protocols is that they are interconnection network independent, making them more scaleable than snoopy bus protocols. The major criticisms of previously defined directory protocols point to the size of memory heeded to store the directory and the amount of communication across the interconnection network required to maintain coherence. This thesis tries solving these problems by changing the entry format of the global table, altering the architecture of the global table, and developing new protocols. Some alternative directory entry formats are described, including a special entry format for implementing queueing semaphores. Evaluation of the various entry formats is done with probabilistic models of shared cache blocks and software simulation. A variable length global table organization is presented which can be used to reduce the size of the global table, regardless of the entry format. Its performance is analyzed using software simulation. A protocol which maintains a linked list of processors which have a particular block cached is presented. Several variations of this protocol induce less interconnection network traffic than traditional protocols

Purdue E-Pubs

Self Adjusting Contention Friendly Concurrent Binary Search Tree by Lazy Splaying

Author: Regmee Mahesh Raj
Publication venue: Digital Scholarship@UNLV
Publication date: 01/05/2014
Field of study

We present a partial blocking implementation of concurrent binary search tree data structure that is contention friendly, fast and scales well. It uses a technique, called lazy splaying to move frequently accessed items close to the root without making the root of the tree a sequential bottleneck. Most of the self adjusting binary search trees are constrained to guarantee the height of a tree even in the presence of concurrency. But, this methodology roughly guarantees the height of a tree only in the absence of contention and limits the contention during concurrent accesses. The main idea is to divide the update operation into two operations:an eager abstract modification with lazy splayingthat completes quickly and makes at most one local rotation of the tree on each access as a function of historical access frequencies; anda lazy structural adaptation with long/semi splayingwhich implements top down recursive splaying of the tree that may be postponed to diminish contention and re-balance the tree during less contention. This way, the frequently accessed items perform full splaying but after few accesses only and always appear near the root of the tree. Whereas, the infrequently accessed items will not get enough pushes up the tree and stay in the bottom part of the tree. As in sequential counting based splay tree, the amortized time bound of each operation isO(log N), whereNis the number of items in the tree

University of Nevada, Las Vegas Repository

Contention in Structured Concurrency: Provably Efficient Dynamic Non-Zero Indicators for Nested Parallelism

Author: Acar Umut,
Ben-David Naama
Rainey Mike
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/02/2017
Field of study

International audienceOver the past two decades, many concurrent data structures have been designed and implemented. Nearly all such work analyzes concurrent data structures empirically, omitting asymptotic bounds on their efficiency, partly because of the complexity of the analysis needed, and partly because of the difficulty of obtaining relevant asymptotic bounds: when the analysis takes into account important practical factors, such as contention, it is difficult or even impossible to prove desirable bounds. In this paper, we show that considering structured concurrency or relaxed concurrency models can enable establishing strong bounds, also for contention. To this end, we first present a dynamic relaxed counter data structure that indicates the non-zero status of the counter. Our data structure extends a recently proposed data structure, called SNZI, allowing our structure to grow dynamically in response to the increasing degree of concurrency in the system. Using the dynamic SNZI data structure, we then present a concurrent data structure for series-parallel directed acyclic graphs (sp-dags), a key data structure widely used in the implementation of modern parallel programming languages. The key component of sp-dags is an in-counter data structure that is an instance of our dynamic SNZI. We analyze the efficiency of our concurrent sp-dags and in-counter data structures under nested-parallel computing paradigm. This paradigm offers a structured model for concurrency. Under this model, we prove that our data structures require amortized O(1) shared memory steps, including contention. We present an implementation and an experimental evaluation that suggests that the sp-dags data structure is practical and can perform well in practice

INRIA a CCSD electronic archive server

A Feature Taxonomy and Survey of Synchronization Primitive Implementations

Author: Glew Andy
Hwu Wen-mei
Publication venue: Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/02/1991
Field of study

Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNCR Corporatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Creating a Concurrent In-Memory B-Tree Optimized for NUMA Systems

Author: McKenzie Marlon
Publication venue: 'University of Waterloo'
Publication date: 01/01/2015
Field of study

The size of main memory is becoming larger. With the number of Central Processing Unit (CPU) cores ever increasing in modern systems, with each of them being able to access memory, the organization of memory becomes more important. In multicore systems, there are two main architectures for memory organization with respect to the cores - Symmetric Multi-Processor (SMP) and Non-Uniform Memory Architecture (NUMA). Prior work has focused on the improvement of the performance of B-Trees in highly concurrent and distributed environments, as well as in memory, for shared-memory mul- tiprocessors. However, little focus has been given to the performance of main memory B-Trees for NUMA systems. This work focuses on improving the performance of B-Trees contained in main memory of NUMA systems by introducing modifications that consider its storage in the physically distributed main memory of the NUMA system. The work in this thesis makes the following contributions to the development of a distributed B-Tree, specifically in a NUMA environment, modified from a B-Tree originally designed for high concurrency: • It introduces replication of internal nodes of the tree and shows how this can improve its overall performance in a NUMA environment. • It introduces NUMA-aware locking procedures with the aim of managing contention and exploiting locality of lock requests with reference to previous client operation request locations. • It introduces changes in the granularity of locking, starting from the original locking of every node to the locking of certain levels of nodes, showing the tradeoff between the granularity of locking and the performance of the tree based on the workload. • It considers the combination of the different techniques, with the aim of finding the combination which performs well overall for varying read-heavy workloads and number of client threads

University of Waterloo's Institutional Repository