Search CORE

14,152 research outputs found

On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

Author: Amann Bernd
Baazizi Mohamed-Amine
Curé Olivier
Naacke Hubert
Publication venue
Publication date: 08/07/2015
Field of study

Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper presents an in-depth analysis and experimental comparison of five representative and complementary distribution approaches. For achieving fair experimental results, we are using Apache Spark as a common parallel computing framework by rewriting the concerned algorithms using the Spark API. Spark provides guarantees in terms of fault tolerance, high availability and scalability which are essential in such systems. Our different implementations aim to highlight the fundamental implementation-independent characteristics of each approach in terms of data preparation, load balancing, data replication and to some extent to query answering cost and performance. The presented measures are obtained by testing each system on one synthetic and one real-world data set over query workloads with differing characteristics and different partitioning constraints.Comment: 16 pages, 3 figure

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Simple proof of confidentiality for private quantum channels in noisy environments

Author: Briegel H. J.
Dunjko V.
Dür W.
Pirker A.
Zwerger M.
Publication venue: 'IOP Publishing'
Publication date: 25/03/2019
Field of study

Complete security proofs for quantum communication protocols can be notoriously involved, which convolutes their verification, and obfuscates the key physical insights the security finally relies on. In such cases, for the majority of the community, the utility of such proofs may be restricted. Here we provide a simple proof of confidentiality for parallel quantum channels established via entanglement distillation based on hashing, in the presence of noise, and a malicious eavesdropper who is restricted only by the laws of quantum mechanics. The direct contribution lies in improving the linear confidentiality levels of recurrence-type entanglement distillation protocols to exponential levels for hashing protocols. The proof directly exploits the security relevant physical properties: measurement-based quantum computation with resource states and the separation of Bell-pairs from an eavesdropper. The proof also holds for situations where Eve has full control over the input states, and obtains all information about the operations and noise applied by the parties. The resulting state after hashing is private, i.e., disentangled from the eavesdropper. Moreover, the noise regimes for entanglement distillation and confidentiality do not coincide: Confidentiality can be guaranteed even in situation where entanglement distillation fails. We extend our results to multiparty situations which are of special interest for secure quantum networks.Comment: 5 + 11 pages, 0 + 4 figures, A. Pirker and M. Zwerger contributed equally to this work, replaced with accepted versio

arXiv.org e-Print Archive

KOPS - The Institutional Repository of the University of Konstanz

MPG.PuRe

FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

Author: Andoni A.
Broder A. Z.
Li P.
Lv Q.
Shrivastava A.
Shrivastava A.
Weber R.
Publication venue
Publication date: 03/07/2018
Field of study

We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for \textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. By leveraging a LSH style randomized indexing procedure and combining it with several principled techniques, such as reservoir sampling, recent advances in one-pass minwise hashing, and count based estimations, we reduce the computational and parallelization costs of similarity search, while retaining sound theoretical guarantees. We evaluate FLASH on several real, high-dimensional datasets from different domains, including text, malicious URL, click-through prediction, social networks, etc. Our experiments shed new light on the difficulties associated with datasets having several million dimensions. Current state-of-the-art implementations either fail on the presented scale or are orders of magnitude slower than FLASH. FLASH is capable of computing an approximate k-NN graph, from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than 10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam dataset, using brute-force (

n^2D

), will require at least 20 teraflops. We provide CPU and GPU implementations of FLASH for replicability of our results

arXiv.org e-Print Archive

Crossref