101 research outputs found

    Uniqueness of Optimal Mod 3 Circuits for Parity

    Get PDF
    We prove that the quadratic polynomials modulo 33 with the largest correlation with parity are unique up to permutation of variables and constant factors. As a consequence of our result, we completely characterize the smallest MAJ~circmboxMOD3circmAND2circ mbox{MOD}_3 circ { m AND}_2 circuits that compute parity, where a MAJ~circmboxMOD3circmAND2circ mbox{MOD}_3 circ { m AND}_2 circuit is one that has a majority gate as output, a middle layer of MOD3_3 gates and a bottom layer of AND gates of fan-in 22. We also prove that the sub-optimal circuits exhibit a stepped behavior: any sub-optimal circuits of this class that compute parity must have size at least a factor of frac2sqrt3frac{2}{sqrt{3}} times the optimal size. This verifies, for the special case of m=3m=3, two conjectures made by Due~{n}ez, Miller, Roy and Straubing (Journal of Number Theory, 2006) for general MAJ~circmathrmMODmcircmAND2circ mathrm{MOD}_m circ { m AND}_2 circuits for any odd mm. The correlation and circuit bounds are obtained by studying the associated exponential sums, based on some of the techniques developed by Green (JCSS, 2004). We regard this as a step towards obtaining tighter bounds both for the mot=3m ot = 3 quadratic case as well as for higher degrees

    ALLARM: Optimizing Sparse Directories for Thread-Local Data

    Get PDF
    Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread-private for the whole of its lifetime. These include resources devoted to tracking the coherence state of the data, as well as unnecessary coherence messages sent out over the interconnect. In this paper we show how the memory allocation strategy for non-uniform memory access (NUMA) systems can be exploited to remove any coherence-related traffic for thread-local data, as well removing the need to track those cache lines in sparse directories. Our strategy is to allocate directory state only on a miss from a node in a different affinity domain from the directory. We call this ALLocAte on Remote Miss, or ALLARM. Our solution is entirely backward compatible with existing operating systems and software, and provides a means to scale cache coherence into the many-core era. On a mix of SPLASH2 and Parsec workloads, ALLARM is able to improve performance by 13\% on average while reducing dynamic energy consumption by 9\% in the on-chip network and 15\% in the directory controller. This is achieved through a 46\% reduction in the number of sparse directory entries evicted

    Incomplete Quadratic Exponential Sums in Several Variables

    Get PDF
    We consider incomplete exponential sums in several variables of the form S(f,n,m) = \frac{1}{2^n} \sum_{x_1 \in \{-1,1\}} ... \sum_{x_n \in \{-1,1\}} x_1 ... x_n e^{2\pi i f(x)/p}, where m>1 is odd and f is a polynomial of degree d with coefficients in Z/mZ. We investigate the conjecture, originating in a problem in computational complexity, that for each fixed d and m the maximum norm of S(f,n,m) converges exponentially fast to 0 as n grows to infinity. The conjecture is known to hold in the case when m=3 and d=2, but existing methods for studying incomplete exponential sums appear to be insufficient to resolve the question for an arbitrary odd modulus m, even when d=2. In the present paper we develop three separate techniques for studying the problem in the case of quadratic f, each of which establishes a different special case of the conjecture. We show that a bound of the required sort holds for almost all quadratic polynomials, a stronger form of the conjecture holds for all quadratic polynomials with no more than 10 variables, and for arbitrarily many variables the conjecture is true for a class of quadratic polynomials having a special form.Comment: 31 pages (minor corrections from original draft, references to new results in the subject, publication information
    • …
    corecore