22,174 research outputs found

    Thread-Scalable Evaluation of Multi-Jet Observables

    Get PDF
    A leading-order, leading-color parton-level event generator is developed for use on a multi-threaded GPU. Speed-up factors between 150 and 300 are obtained compared to an unoptimized CPU-based implementation of the event generator. In this first paper we study the feasibility of a GPU-based event generator with an emphasis on the constraints imposed by the hardware. Some studies of Monte Carlo convergence and accuracy are presented for PP -> 2,...,10 jet observables using of the order of 1e11 events.Comment: 16 pages, 5 figures, 3 table

    Path ORAM: An Extremely Simple Oblivious RAM Protocol

    Get PDF
    We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme known to date with small client storage. We formally prove that Path ORAM has a O(log N) bandwidth cost for blocks of size B = Omega(log^2 N) bits. For such block sizes, Path ORAM is asymptotically better than the best known ORAM schemes with small client storage. Due to its practicality, Path ORAM has been adopted in the design of secure processors since its proposal

    Libpsht - algorithms for efficient spherical harmonic transforms

    Full text link
    Libpsht (or "library for Performant Spherical Harmonic Transforms") is a collection of algorithms for efficient conversion between spatial-domain and spectral-domain representations of data defined on the sphere. The package supports transforms of scalars as well as spin-1 and spin-2 quantities, and can be used for a wide range of pixelisations (including HEALPix, GLESP and ECP). It will take advantage of hardware features like multiple processor cores and floating-point vector operations, if available. Even without this additional acceleration, the employed algorithms are among the most efficient (in terms of CPU time as well as memory consumption) currently being used in the astronomical community. The library is written in strictly standard-conforming C90, ensuring portability to many different hard- and software platforms, and allowing straightforward integration with codes written in various programming languages like C, C++, Fortran, Python etc. Libpsht is distributed under the terms of the GNU General Public License (GPL) version 2 and can be downloaded from http://sourceforge.net/projects/libpsht.Comment: 9 pages, 8 figures, accepted by A&

    Limited-memory BFGS Systems with Diagonal Updates

    Get PDF
    In this paper, we investigate a formula to solve systems of the form (B + {\sigma}I)x = y, where B is a limited-memory BFGS quasi-Newton matrix and {\sigma} is a positive constant. These types of systems arise naturally in large-scale optimization such as trust-region methods as well as doubly-augmented Lagrangian methods. We show that provided a simple condition holds on B_0 and \sigma, the system (B + \sigma I)x = y can be solved via a recursion formula that requies only vector inner products. This formula has complexity M^2n, where M is the number of L-BFGS updates and n >> M is the dimension of x

    Adaptive Mesh Refinement for Characteristic Grids

    Full text link
    I consider techniques for Berger-Oliger adaptive mesh refinement (AMR) when numerically solving partial differential equations with wave-like solutions, using characteristic (double-null) grids. Such AMR algorithms are naturally recursive, and the best-known past Berger-Oliger characteristic AMR algorithm, that of Pretorius & Lehner (J. Comp. Phys. 198 (2004), 10), recurses on individual "diamond" characteristic grid cells. This leads to the use of fine-grained memory management, with individual grid cells kept in 2-dimensional linked lists at each refinement level. This complicates the implementation and adds overhead in both space and time. Here I describe a Berger-Oliger characteristic AMR algorithm which instead recurses on null \emph{slices}. This algorithm is very similar to the usual Cauchy Berger-Oliger algorithm, and uses relatively coarse-grained memory management, allowing entire null slices to be stored in contiguous arrays in memory. The algorithm is very efficient in both space and time. I describe discretizations yielding both 2nd and 4th order global accuracy. My code implementing the algorithm described here is included in the electronic supplementary materials accompanying this paper, and is freely available to other researchers under the terms of the GNU general public license.Comment: 37 pages, 15 figures (40 eps figure files, 8 of them color; all are viewable ok in black-and-white), 1 mpeg movie, uses Springer-Verlag svjour3 document class, includes C++ source code. Changes from v1: revised in response to referee comments: many references added, new figure added to better explain the algorithm, other small changes, C++ code updated to latest versio
    corecore