276 research outputs found
Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers
The nested parallel (a.k.a. fork-join) model is widely used for writing
parallel programs. However, the two composition constructs, i.e. ""
(parallel) and "" (serial), are insufficient in expressing "partial
dependencies" or "partial parallelism" in a program. We propose a new dataflow
composition construct "" to express partial dependencies in
algorithms in a processor- and cache-oblivious way, thus extending the Nested
Parallel (NP) model to the \emph{Nested Dataflow} (ND) model. We redesign
several divide-and-conquer algorithms ranging from dense linear algebra to
dynamic-programming in the ND model and prove that they all have optimal span
while retaining optimal cache complexity. We propose the design of runtime
schedulers that map ND programs to multicore processors with multiple levels of
possibly shared caches (i.e, Parallel Memory Hierarchies) and provide
theoretical guarantees on their ability to preserve locality and load balance.
For this, we adapt space-bounded (SB) schedulers for the ND model. We show that
our algorithms have increased "parallelizability" in the ND model, and that SB
schedulers can use the extra parallelizability to achieve asymptotically
optimal bounds on cache misses and running time on a greater number of
processors than in the NP model. The running time for the algorithms in this
paper is , where is the cache complexity of task ,
is the cost of cache miss at level- cache which is of size ,
is a constant, and is the number of processors in an
-level cache hierarchy
Program fracture and recombination for efficient automatic code reuse
Abstract—We present a new code transfer technique, program fracture and recombination, for automatically replacing, delet-ing, and/or combining code from multiple applications. Benefits include automatic generation of new applications incorporating the best or most desirable functionality developed anywhere, the automatic elimination of errors and security vulnerabilities, effective software rejuvenation, the automatic elimination of obsolete or undesirable functionality, and improved performance, energy efficiency, simplicity, analyzability, and clarity. The technique may be particularly appropriate for high performance computing. The field has devoted years of effort to developing efficient (but complex) implementations of standard linear algebra operations with good numerical properties. At the same time these operations also have very simple but inefficient implementations, often with poor numerical properties. Program fracture and recombination allows developers to work with the simple implementation during development and testing, then use program fracture and recombination to automatically find and deploy the most appropriate implementation for the hardware platform at hand. The benefits include reduced implementation effort, increased code clarity, and the ability to automatically search for and find efficient implementations with good numerical properties. I
SoK: Cryptographically Protected Database Search
Protected database search systems cryptographically isolate the roles of
reading from, writing to, and administering the database. This separation
limits unnecessary administrator access and protects data in the case of system
breaches. Since protected search was introduced in 2000, the area has grown
rapidly; systems are offered by academia, start-ups, and established companies.
However, there is no best protected search system or set of techniques.
Design of such systems is a balancing act between security, functionality,
performance, and usability. This challenge is made more difficult by ongoing
database specialization, as some users will want the functionality of SQL,
NoSQL, or NewSQL databases. This database evolution will continue, and the
protected search community should be able to quickly provide functionality
consistent with newly invented databases.
At the same time, the community must accurately and clearly characterize the
tradeoffs between different approaches. To address these challenges, we provide
the following contributions:
1) An identification of the important primitive operations across database
paradigms. We find there are a small number of base operations that can be used
and combined to support a large number of database paradigms.
2) An evaluation of the current state of protected search systems in
implementing these base operations. This evaluation describes the main
approaches and tradeoffs for each base operation. Furthermore, it puts
protected search in the context of unprotected search, identifying key gaps in
functionality.
3) An analysis of attacks against protected search for different base
queries.
4) A roadmap and tools for transforming a protected search system into a
protected database, including an open-source performance evaluation platform
and initial user opinions of protected search.Comment: 20 pages, to appear to IEEE Security and Privac
Effective data parallel computing on multicore processors
The rise of chip multiprocessing or the integration of multiple general purpose processing cores on a single chip (multicores), has impacted all computing platforms including high performance, servers, desktops, mobile, and embedded processors. Programmers can no longer expect continued increases in software performance without developing parallel, memory hierarchy friendly software that can effectively exploit the chip level multiprocessing paradigm of multicores. The goal of this dissertation is to demonstrate a design process for data parallel problems that starts with a sequential algorithm and ends with a high performance implementation on a multicore platform. Our design process combines theoretical algorithm analysis with practical optimization techniques. Our target multicores are quad-core processors from Intel and the eight-SPE IBM Cell B.E. Target applications include Matrix Multiplications (MM), Finite Difference Time Domain (FDTD), LU Decomposition (LUD), and Power Flow Solver based on Gauss-Seidel (PFS-GS) algorithms. These applications are popular computation methods in science and engineering problems and are characterized by unit-stride (MM, LUD, and PFS-GS) or 2-point stencil (FDTD) memory access pattern. The main contributions of this dissertation include a cache- and space-efficient algorithm model, integrated data pre-fetching and caching strategies, and in-core optimization techniques. Our multicore efficient implementations of the above described applications outperform nai¨ve parallel implementations by at least 2x and scales well with problem size and with the number of processing cores
Network-Oblivious Algorithms
A framework is proposed for the design and analysis of network-oblivious algorithms, namely algorithms that can run unchanged, yet efficiently, on a variety of machines characterized by different degrees of parallelism and communication capabilities. The framework prescribes that a network-oblivious algorithm be specified on a parallel model of computation where the only parameter is the problem\u2019s input size, and then evaluated on a model with two parameters, capturing parallelism granularity and communication latency. It is shown that for a wide class of network-oblivious algorithms, optimality in the latter model implies optimality in the decomposable bulk synchronous parallel model, which is known to effectively describe a wide and significant class of parallel platforms. The proposed framework can be regarded as an attempt to port the notion of obliviousness, well established in the context of cache hierarchies, to the realm of parallel computation. Its effectiveness is illustrated by providing optimal network-oblivious algorithms for a number of key problems. Some limitations of the oblivious approach are also discussed
- …