Search CORE

101 research outputs found

Recommended from our members

Software lock elision for x86 machine code

Author: Roy Amitabha
Publication venue: University of Cambridge
Publication date: 12/07/2011
Field of study

More than a decade after becoming a topic of intense research there is no transactional memory hardware nor any examples of software transactional memory use outside the research community. Using software transactional memory in large pieces of software needs copious source code annotations and often means that standard compilers and debuggers can no longer be used. At the same time, overheads associated with software transactional memory fail to motivate programmers to expend the needed effort to use software transactional memory. The only way around the overheads in the case of general unmanaged code is the anticipated availability of hardware support. On the other hand, architects are unwilling to devote power and area budgets in mainstream microprocessors to hardware transactional memory, pointing to transactional memory being a "niche" programming construct. A deadlock has thus ensued that is blocking transactional memory use and experimentation in the mainstream. This dissertation covers the design and construction of a software transactional memory runtime system called SLE_x86 that can potentially break this deadlock by decoupling transactional memory from programs using it. Unlike most other STM designs, the core design principle is transparency rather than performance. SLE_x86 operates at the level of x86 machine code, thereby becoming immediately applicable to binaries for the popular x86 architecture. The only requirement is that the binary synchronise using known locking constructs or calls such as those in Pthreads or OpenMP libraries. SLE_x86 provides speculative lock elision (SLE) entirely in software, executing critical sections in the binary using transactional memory. Optionally, the critical sections can also be executed without using transactions by acquiring the protecting lock. The dissertation makes a careful analysis of the impact on performance due to the demands of the x86 memory consistency model and the need to transparently instrument x86 machine code. It shows that both of these problems can be overcome to reach a reasonable level of performance, where transparent software transactional memory can perform better than a lock. SLE_x86 can ensure that programs are ready for transactional memory in any form, without being explicitly written for it

Apollo (Cambridge)

Uniqueness of Optimal Mod 3 Circuits for Parity

Author: Green Frederic
Roy Amitabha
Publication venue: Dagstuhl Seminar Proceedings. 07411 - Algebraic Methods in Computational Complexity
Publication date: 01/01/2008
Field of study

We prove that the quadratic polynomials modulo

3

with the largest correlation with parity are unique up to permutation of variables and constant factors. As a consequence of our result, we completely characterize the smallest MAJ~

circ mbox{MOD}_3 circ { m AND}_2

circuits that compute parity, where a MAJ~

circ mbox{MOD}_3 circ { m AND}_2

circuit is one that has a majority gate as output, a middle layer of MOD

_3

gates and a bottom layer of AND gates of fan-in

2

. We also prove that the sub-optimal circuits exhibit a stepped behavior: any sub-optimal circuits of this class that compute parity must have size at least a factor of

frac{2}{sqrt{3}}

times the optimal size. This verifies, for the special case of

m=3

, two conjectures made by Due~{n}ez, Miller, Roy and Straubing (Journal of Number Theory, 2006) for general MAJ~

circ mathrm{MOD}_m circ { m AND}_2

circuits for any odd

m

. The correlation and circuit bounds are obtained by studying the associated exponential sums, based on some of the techniques developed by Green (JCSS, 2004). We regard this as a step towards obtaining tighter bounds both for the

m ot = 3

quadratic case as well as for higher degrees

Clark University

DROPS Dagstuhl Research Online Publication Server

ALLARM: Optimizing Sparse Directories for Thread-Local Data

Author: Jones Timothy
Roy Amitabha
Publication venue
Publication date: 21/12/2013
Field of study

Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread-private for the whole of its lifetime. These include resources devoted to tracking the coherence state of the data, as well as unnecessary coherence messages sent out over the interconnect. In this paper we show how the memory allocation strategy for non-uniform memory access (NUMA) systems can be exploited to remove any coherence-related traffic for thread-local data, as well removing the need to track those cache lines in sparse directories. Our strategy is to allocate directory state only on a miss from a node in a different affinity domain from the directory. We call this ALLocAte on Remote Miss, or ALLARM. Our solution is entirely backward compatible with existing operating systems and software, and provides a means to scale cache coherence into the many-core era. On a mix of SPLASH2 and Parsec workloads, ALLARM is able to improve performance by 13\% on average while reducing dynamic energy consumption by 9\% in the on-chip network and 15\% in the directory controller. This is achieved through a 46\% reduction in the number of sparse directory entries evicted

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Incomplete Quadratic Exponential Sums in Several Variables

Author: Amitabha Roy
Arhipov
Cai
Chubarikov
Deligne
Eduardo Dueñez
Green
Green
Howard Straubing
Loxton
Mordell
Steven J. Miller
Publication venue: 'Elsevier BV'
Publication date: 30/12/2005
Field of study

We consider incomplete exponential sums in several variables of the form S(f,n,m) = \frac{1}{2^n} \sum_{x_1 \in \{-1,1\}} ... \sum_{x_n \in \{-1,1\}} x_1 ... x_n e^{2\pi i f(x)/p}, where m>1 is odd and f is a polynomial of degree d with coefficients in Z/mZ. We investigate the conjecture, originating in a problem in computational complexity, that for each fixed d and m the maximum norm of S(f,n,m) converges exponentially fast to 0 as n grows to infinity. The conjecture is known to hold in the case when m=3 and d=2, but existing methods for studying incomplete exponential sums appear to be insufficient to resolve the question for an arbitrary odd modulus m, even when d=2. In the present paper we develop three separate techniques for studying the problem in the case of quadratic f, each of which establishes a different special case of the conjecture. We show that a bound of the required sort holds for almost all quadratic polynomials, a stronger form of the conjecture holds for all quadratic polynomials with no more than 10 variables, and for arbitrarily many variables the conjecture is true for a class of quadratic polynomials having a special form.Comment: 31 pages (minor corrections from original draft, references to new results in the subject, publication information

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

Bounds on an exponential sum arising in Boolean circuit complexity

Author: Amitabha Roy
Frederic Green
Howard Straubing
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

We study exponential sums of the form S = 2-n ∑x∈{0,1}n em (h(x))eq (p(x)), where m, q ∈ Z+ are relatively prime, p is a polynomial with coefficients in Zq, and h(x) = a(x1 +⋯+ xn) for some 1 ≤ a \u3c m. We prove an upper bound of the form 2-Ω(n) on S . This generalizes a result of J. Bourgain, who establishes this bound in the case where q is odd. This bound has consequences in Boolean circuit complexity. © Académie des sciences. Published by Elsevier SAS. All rights reserved

Comptes Rendus Mathématique

Clark University

Numérisation de Documents Anciens Mathématiques