22,184 research outputs found
Path ORAM: An Extremely Simple Oblivious RAM Protocol
We present Path ORAM, an extremely simple Oblivious RAM protocol with a small
amount of client storage. Partly due to its simplicity, Path ORAM is the most
practical ORAM scheme known to date with small client storage. We formally
prove that Path ORAM has a O(log N) bandwidth cost for blocks of size B =
Omega(log^2 N) bits. For such block sizes, Path ORAM is asymptotically better
than the best known ORAM schemes with small client storage. Due to its
practicality, Path ORAM has been adopted in the design of secure processors
since its proposal
Thread-Scalable Evaluation of Multi-Jet Observables
A leading-order, leading-color parton-level event generator is developed for
use on a multi-threaded GPU. Speed-up factors between 150 and 300 are obtained
compared to an unoptimized CPU-based implementation of the event generator. In
this first paper we study the feasibility of a GPU-based event generator with
an emphasis on the constraints imposed by the hardware. Some studies of Monte
Carlo convergence and accuracy are presented for PP -> 2,...,10 jet observables
using of the order of 1e11 events.Comment: 16 pages, 5 figures, 3 table
Libpsht - algorithms for efficient spherical harmonic transforms
Libpsht (or "library for Performant Spherical Harmonic Transforms") is a
collection of algorithms for efficient conversion between spatial-domain and
spectral-domain representations of data defined on the sphere. The package
supports transforms of scalars as well as spin-1 and spin-2 quantities, and can
be used for a wide range of pixelisations (including HEALPix, GLESP and ECP).
It will take advantage of hardware features like multiple processor cores and
floating-point vector operations, if available. Even without this additional
acceleration, the employed algorithms are among the most efficient (in terms of
CPU time as well as memory consumption) currently being used in the
astronomical community.
The library is written in strictly standard-conforming C90, ensuring
portability to many different hard- and software platforms, and allowing
straightforward integration with codes written in various programming languages
like C, C++, Fortran, Python etc.
Libpsht is distributed under the terms of the GNU General Public License
(GPL) version 2 and can be downloaded from
http://sourceforge.net/projects/libpsht.Comment: 9 pages, 8 figures, accepted by A&
Limited-memory BFGS Systems with Diagonal Updates
In this paper, we investigate a formula to solve systems of the form (B +
{\sigma}I)x = y, where B is a limited-memory BFGS quasi-Newton matrix and
{\sigma} is a positive constant. These types of systems arise naturally in
large-scale optimization such as trust-region methods as well as
doubly-augmented Lagrangian methods. We show that provided a simple condition
holds on B_0 and \sigma, the system (B + \sigma I)x = y can be solved via a
recursion formula that requies only vector inner products. This formula has
complexity M^2n, where M is the number of L-BFGS updates and n >> M is the
dimension of x
Adaptive Mesh Refinement for Characteristic Grids
I consider techniques for Berger-Oliger adaptive mesh refinement (AMR) when
numerically solving partial differential equations with wave-like solutions,
using characteristic (double-null) grids. Such AMR algorithms are naturally
recursive, and the best-known past Berger-Oliger characteristic AMR algorithm,
that of Pretorius & Lehner (J. Comp. Phys. 198 (2004), 10), recurses on
individual "diamond" characteristic grid cells. This leads to the use of
fine-grained memory management, with individual grid cells kept in
2-dimensional linked lists at each refinement level. This complicates the
implementation and adds overhead in both space and time.
Here I describe a Berger-Oliger characteristic AMR algorithm which instead
recurses on null \emph{slices}. This algorithm is very similar to the usual
Cauchy Berger-Oliger algorithm, and uses relatively coarse-grained memory
management, allowing entire null slices to be stored in contiguous arrays in
memory. The algorithm is very efficient in both space and time.
I describe discretizations yielding both 2nd and 4th order global accuracy.
My code implementing the algorithm described here is included in the electronic
supplementary materials accompanying this paper, and is freely available to
other researchers under the terms of the GNU general public license.Comment: 37 pages, 15 figures (40 eps figure files, 8 of them color; all are
viewable ok in black-and-white), 1 mpeg movie, uses Springer-Verlag svjour3
document class, includes C++ source code. Changes from v1: revised in
response to referee comments: many references added, new figure added to
better explain the algorithm, other small changes, C++ code updated to latest
versio
- …