15,168 research outputs found
Deterministic, Stash-Free Write-Only ORAM
Write-Only Oblivious RAM (WoORAM) protocols provide privacy by encrypting the
contents of data and also hiding the pattern of write operations over that
data. WoORAMs provide better privacy than plain encryption and better
performance than more general ORAM schemes (which hide both writing and reading
access patterns), and the write-oblivious setting has been applied to important
applications of cloud storage synchronization and encrypted hidden volumes. In
this paper, we introduce an entirely new technique for Write-Only ORAM, called
DetWoORAM. Unlike previous solutions, DetWoORAM uses a deterministic,
sequential writing pattern without the need for any "stashing" of blocks in
local state when writes fail. Our protocol, while conceptually simple, provides
substantial improvement over prior solutions, both asymptotically and
experimentally. In particular, under typical settings the DetWoORAM writes only
2 blocks (sequentially) to backend memory for each block written to the device,
which is optimal. We have implemented our solution using the BUSE (block device
in user-space) module and tested DetWoORAM against both an encryption only
baseline of dm-crypt and prior, randomized WoORAM solutions, measuring only a
3x-14x slowdown compared to an encryption-only baseline and around 6x-19x
speedup compared to prior work
Recommended from our members
Behavioral synthesis from VHDL using structured modeling
This dissertation describes work in behavioral synthesis involving the development of a VHDL Synthesis System VSS which accepts a VHDL behavioral input specification and performs technology independent synthesis to generate a circuit netlist of generic components. The VHDL language is used for input and output descriptions. An intermediate representation which incorporates signal typing and component attributes simplifies compilation and facilitates design optimization.A Structured Modeling methodology has been developed to suggest standard VHDL modeling practices for synthesis. Structured modeling provides recommendations for the use of available VHDL description styles so that optimal designs will be synthesized.A design composed of generic components is synthesized from the input description through a process of Graph Compilation, Graph Criticism, and Design Compilation. Experiments were performed to demonstrate the effects of different modeling styles on the quality of the design produced by VSS. Several alternative VHDL models were examined for each benchmark, illustrating the improvements in design quality achieved when Structured Modeling guidelines were followed
Optimization guide for programs compiled under IBM FORTRAN H (OPT=2)
Guidelines are given to provide the programmer with various techniques for optimizing programs when the FORTRAN IV H compiler is used with OPT=2. Subroutines and programs are described in the appendices along with a timing summary of all the examples given in the manual
Communication Complexity and Secure Function Evaluation
We suggest two new methodologies for the design of efficient secure
protocols, that differ with respect to their underlying computational models.
In one methodology we utilize the communication complexity tree (or branching
for f and transform it into a secure protocol. In other words, "any function f
that can be computed using communication complexity c can be can be computed
securely using communication complexity that is polynomial in c and a security
parameter". The second methodology uses the circuit computing f, enhanced with
look-up tables as its underlying computational model. It is possible to
simulate any RAM machine in this model with polylogarithmic blowup. Hence it is
possible to start with a computation of f on a RAM machine and transform it
into a secure protocol.
We show many applications of these new methodologies resulting in protocols
efficient either in communication or in computation. In particular, we
exemplify a protocol for the "millionaires problem", where two participants
want to compare their values but reveal no other information. Our protocol is
more efficient than previously known ones in either communication or
computation
Spherical harmonic transform with GPUs
We describe an algorithm for computing an inverse spherical harmonic
transform suitable for graphic processing units (GPU). We use CUDA and base our
implementation on a Fortran90 routine included in a publicly available parallel
package, S2HAT. We focus our attention on the two major sequential steps
involved in the transforms computation, retaining the efficient parallel
framework of the original code. We detail optimization techniques used to
enhance the performance of the CUDA-based code and contrast them with those
implemented in the Fortran90 version. We also present performance comparisons
of a single CPU plus GPU unit with the S2HAT code running on either a single or
4 processors. In particular we find that use of the latest generation of GPUs,
such as NVIDIA GF100 (Fermi), can accelerate the spherical harmonic transforms
by as much as 18 times with respect to S2HAT executed on one core, and by as
much as 5.5 with respect to S2HAT on 4 cores, with the overall performance
being limited by the Fast Fourier transforms. The work presented here has been
performed in the context of the Cosmic Microwave Background simulations and
analysis. However, we expect that the developed software will be of more
general interest and applicability
Taming Numbers and Durations in the Model Checking Integrated Planning System
The Model Checking Integrated Planning System (MIPS) is a temporal least
commitment heuristic search planner based on a flexible object-oriented
workbench architecture. Its design clearly separates explicit and symbolic
directed exploration algorithms from the set of on-line and off-line computed
estimates and associated data structures. MIPS has shown distinguished
performance in the last two international planning competitions. In the last
event the description language was extended from pure propositional planning to
include numerical state variables, action durations, and plan quality objective
functions. Plans were no longer sequences of actions but time-stamped
schedules. As a participant of the fully automated track of the competition,
MIPS has proven to be a general system; in each track and every benchmark
domain it efficiently computed plans of remarkable quality. This article
introduces and analyzes the most important algorithmic novelties that were
necessary to tackle the new layers of expressiveness in the benchmark problems
and to achieve a high level of performance. The extensions include critical
path analysis of sequentially generated plans to generate corresponding optimal
parallel plans. The linear time algorithm to compute the parallel plan bypasses
known NP hardness results for partial ordering by scheduling plans with respect
to the set of actions and the imposed precedence relations. The efficiency of
this algorithm also allows us to improve the exploration guidance: for each
encountered planning state the corresponding approximate sequential plan is
scheduled. One major strength of MIPS is its static analysis phase that grounds
and simplifies parameterized predicates, functions and operators, that infers
knowledge to minimize the state description length, and that detects domain
object symmetries. The latter aspect is analyzed in detail. MIPS has been
developed to serve as a complete and optimal state space planner, with
admissible estimates, exploration engines and branching cuts. In the
competition version, however, certain performance compromises had to be made,
including floating point arithmetic, weighted heuristic search exploration
according to an inadmissible estimate and parameterized optimization
AirIndex: Versatile Index Tuning Through Data and Storage
The end-to-end lookup latency of a hierarchical index -- such as a B-tree or
a learned index -- is determined by its structure such as the number of layers,
the kinds of branching functions appearing in each layer, the amount of data we
must fetch from layers, etc. Our primary observation is that by optimizing
those structural parameters (or designs) specifically to a target system's I/O
characteristics (e.g., latency, bandwidth), we can offer a faster lookup
compared to the ones that are not optimized. Can we develop a systematic method
for finding those optimal design parameters? Ideally, the method must have the
potential to generate almost any existing index or a novel combination of them
for the fastest possible lookup.
In this work, we present new data and an I/O-aware index builder (called
AirIndex) that can find high-speed hierarchical index designs in a principled
way. Specifically, AirIndex minimizes an objective function expressing the
end-to-end latency in terms of various designs -- the number of layers, types
of layers, and more -- for given data and a storage profile, using a
graph-based optimization method purpose-built to address the computational
challenges rising from the inter-dependencies among index layers and the
exponentially many candidate parameters in a large search space. Our empirical
studies confirm that AirIndex can find optimal index designs, build optimal
indexes within the times comparable to existing methods, and deliver up to 4.1x
faster lookup than a lightweight B-tree library (LMDB), 3.3x--46.3x faster than
state-of-the-art learned indexes (RMI/CDFShop, PGM-Index, ALEX/APEX, PLEX), and
2.0 faster than Data Calculator's suggestion on various dataset and storage
settings.Comment: 13 pages, 3 appendices, 19 figures, to appear at SIGMOD 202
A Comparative Xeon and CBE Performance Analysis
The Cell Broadband Engine is a high performance multicore processor with superb performance on certain types of problems. However, it does not perform as well running other algorithms, particularly those with heavy branching. The Intel Xeon processor is a high performance superscalar processor. It utilizes a high clock speed and deep pipelines to help it achieve superior performance. But deep pipelines can perform poorly with frequent memory accesses. This paper is a study and attempt at quantifying the types of programmatic structures that are more suitable to a particular architecture. It focuses on the issues of pipelines, memory access and branching on these two microprocessor architectures
- …