14,152 research outputs found
On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark
Querying very large RDF data sets in an efficient manner requires a
sophisticated distribution strategy. Several innovative solutions have recently
been proposed for optimizing data distribution with predefined query workloads.
This paper presents an in-depth analysis and experimental comparison of five
representative and complementary distribution approaches. For achieving fair
experimental results, we are using Apache Spark as a common parallel computing
framework by rewriting the concerned algorithms using the Spark API. Spark
provides guarantees in terms of fault tolerance, high availability and
scalability which are essential in such systems. Our different implementations
aim to highlight the fundamental implementation-independent characteristics of
each approach in terms of data preparation, load balancing, data replication
and to some extent to query answering cost and performance. The presented
measures are obtained by testing each system on one synthetic and one
real-world data set over query workloads with differing characteristics and
different partitioning constraints.Comment: 16 pages, 3 figure
Simple proof of confidentiality for private quantum channels in noisy environments
Complete security proofs for quantum communication protocols can be
notoriously involved, which convolutes their verification, and obfuscates the
key physical insights the security finally relies on. In such cases, for the
majority of the community, the utility of such proofs may be restricted. Here
we provide a simple proof of confidentiality for parallel quantum channels
established via entanglement distillation based on hashing, in the presence of
noise, and a malicious eavesdropper who is restricted only by the laws of
quantum mechanics. The direct contribution lies in improving the linear
confidentiality levels of recurrence-type entanglement distillation protocols
to exponential levels for hashing protocols. The proof directly exploits the
security relevant physical properties: measurement-based quantum computation
with resource states and the separation of Bell-pairs from an eavesdropper. The
proof also holds for situations where Eve has full control over the input
states, and obtains all information about the operations and noise applied by
the parties. The resulting state after hashing is private, i.e., disentangled
from the eavesdropper. Moreover, the noise regimes for entanglement
distillation and confidentiality do not coincide: Confidentiality can be
guaranteed even in situation where entanglement distillation fails. We extend
our results to multiparty situations which are of special interest for secure
quantum networks.Comment: 5 + 11 pages, 0 + 4 figures, A. Pirker and M. Zwerger contributed
equally to this work, replaced with accepted versio
FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search
We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for
\textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search
system for ultra-high dimensional datasets on a single machine, that does not
require similarity computations and is tailored for high-performance computing
platforms. By leveraging a LSH style randomized indexing procedure and
combining it with several principled techniques, such as reservoir sampling,
recent advances in one-pass minwise hashing, and count based estimations, we
reduce the computational and parallelization costs of similarity search, while
retaining sound theoretical guarantees.
We evaluate FLASH on several real, high-dimensional datasets from different
domains, including text, malicious URL, click-through prediction, social
networks, etc. Our experiments shed new light on the difficulties associated
with datasets having several million dimensions. Current state-of-the-art
implementations either fail on the presented scale or are orders of magnitude
slower than FLASH. FLASH is capable of computing an approximate k-NN graph,
from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than
10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam
dataset, using brute-force (), will require at least 20 teraflops. We
provide CPU and GPU implementations of FLASH for replicability of our results
- …