6,646 research outputs found
Using Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach
While it is well-known and acknowledged that the performance of graph
algorithms is heavily dependent on the input data, there has been surprisingly
little research to quantify and predict the impact the graph structure has on
performance. Parallel graph algorithms, running on many-core systems such as
GPUs, are no exception: most research has focused on how to efficiently
implement and tune different graph operations on a specific GPU. However, the
performance impact of the input graph has only been taken into account
indirectly as a result of the graphs used to benchmark the system.
In this work, we present a case study investigating how to use the properties
of the input graph to improve the performance of the breadth-first search (BFS)
graph traversal. To do so, we first study the performance variation of 15
different BFS implementations across 248 graphs. Using this performance data,
we show that significant speed-up can be achieved by combining the best
implementation for each level of the traversal. To make use of this
data-dependent optimization, we must correctly predict the relative performance
of algorithms per graph level, and enable dynamic switching to the optimal
algorithm for each level at runtime.
We use the collected performance data to train a binary decision tree, to
enable high-accuracy predictions and fast switching. We demonstrate empirically
that our decision tree is both fast enough to allow dynamic switching between
implementations, without noticeable overhead, and accurate enough in its
prediction to enable significant BFS speedup. We conclude that our model-driven
approach (1) enables BFS to outperform state of the art GPU algorithms, and (2)
can be adapted for other BFS variants, other algorithms, or more specific
datasets
When private set intersection meets big data : an efficient and scalable protocol
Large scale data processing brings new challenges to the design of privacy-preserving protocols: how to meet the increasing requirements of speed and throughput of modern applications, and how to scale up smoothly when data being protected is big. Efficiency and scalability become critical criteria for privacy preserving protocols in the age of Big Data. In this paper, we present a new Private Set Intersection (PSI) protocol that is extremely efficient and highly scalable compared with existing protocols. The protocol is based on a novel approach that we call oblivious Bloom intersection. It has linear complexity and relies mostly on efficient symmetric key operations. It has high scalability due to the fact that most operations can be parallelized easily. The protocol has two versions: a basic protocol and an enhanced protocol, the security of the two variants is analyzed and proved in the semi-honest model and the malicious model respectively. A prototype of the basic protocol has been built. We report the result of performance evaluation and compare it against the two previously fastest PSI protocols. Our protocol is orders of magnitude faster than these two protocols. To compute the intersection of two million-element sets, our protocol needs only 41 seconds (80-bit security) and 339 seconds (256-bit security) on moderate hardware in parallel mode
Conclave: secure multi-party computation on big data (extended TR)
Secure Multi-Party Computation (MPC) allows mutually distrusting parties to
run joint computations without revealing private data. Current MPC algorithms
scale poorly with data size, which makes MPC on "big data" prohibitively slow
and inhibits its practical use.
Many relational analytics queries can maintain MPC's end-to-end security
guarantee without using cryptographic MPC techniques for all operations.
Conclave is a query compiler that accelerates such queries by transforming them
into a combination of data-parallel, local cleartext processing and small MPC
steps. When parties trust others with specific subsets of the data, Conclave
applies new hybrid MPC-cleartext protocols to run additional steps outside of
MPC and improve scalability further.
Our Conclave prototype generates code for cleartext processing in Python and
Spark, and for secure MPC using the Sharemind and Obliv-C frameworks. Conclave
scales to data sets between three and six orders of magnitude larger than
state-of-the-art MPC frameworks support on their own. Thanks to its hybrid
protocols, Conclave also substantially outperforms SMCQL, the most similar
existing system.Comment: Extended technical report for EuroSys 2019 pape
Impliance: A Next Generation Information Management Appliance
ably successful in building a large market and adapting to the changes of the
last three decades, its impact on the broader market of information management
is surprisingly limited. If we were to design an information management system
from scratch, based upon today's requirements and hardware capabilities, would
it look anything like today's database systems?" In this paper, we introduce
Impliance, a next-generation information management system consisting of
hardware and software components integrated to form an easy-to-administer
appliance that can store, retrieve, and analyze all types of structured,
semi-structured, and unstructured information. We first summarize the trends
that will shape information management for the foreseeable future. Those trends
imply three major requirements for Impliance: (1) to be able to store, manage,
and uniformly query all data, not just structured records; (2) to be able to
scale out as the volume of this data grows; and (3) to be simple and robust in
operation. We then describe four key ideas that are uniquely combined in
Impliance to address these requirements, namely the ideas of: (a) integrating
software and off-the-shelf hardware into a generic information appliance; (b)
automatically discovering, organizing, and managing all data - unstructured as
well as structured - in a uniform way; (c) achieving scale-out by exploiting
simple, massive parallel processing, and (d) virtualizing compute and storage
resources to unify, simplify, and streamline the management of Impliance.
Impliance is an ambitious, long-term effort to define simpler, more robust, and
more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement
(http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute,
display, and perform the work, make derivative works and make commercial use
of the work, but, you must attribute the work to the author and CIDR 2007.
3rd Biennial Conference on Innovative Data Systems Research (CIDR) January
710, 2007, Asilomar, California, US
- ā¦