14,152 research outputs found

    On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

    Full text link
    Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper presents an in-depth analysis and experimental comparison of five representative and complementary distribution approaches. For achieving fair experimental results, we are using Apache Spark as a common parallel computing framework by rewriting the concerned algorithms using the Spark API. Spark provides guarantees in terms of fault tolerance, high availability and scalability which are essential in such systems. Our different implementations aim to highlight the fundamental implementation-independent characteristics of each approach in terms of data preparation, load balancing, data replication and to some extent to query answering cost and performance. The presented measures are obtained by testing each system on one synthetic and one real-world data set over query workloads with differing characteristics and different partitioning constraints.Comment: 16 pages, 3 figure

    Simple proof of confidentiality for private quantum channels in noisy environments

    Full text link
    Complete security proofs for quantum communication protocols can be notoriously involved, which convolutes their verification, and obfuscates the key physical insights the security finally relies on. In such cases, for the majority of the community, the utility of such proofs may be restricted. Here we provide a simple proof of confidentiality for parallel quantum channels established via entanglement distillation based on hashing, in the presence of noise, and a malicious eavesdropper who is restricted only by the laws of quantum mechanics. The direct contribution lies in improving the linear confidentiality levels of recurrence-type entanglement distillation protocols to exponential levels for hashing protocols. The proof directly exploits the security relevant physical properties: measurement-based quantum computation with resource states and the separation of Bell-pairs from an eavesdropper. The proof also holds for situations where Eve has full control over the input states, and obtains all information about the operations and noise applied by the parties. The resulting state after hashing is private, i.e., disentangled from the eavesdropper. Moreover, the noise regimes for entanglement distillation and confidentiality do not coincide: Confidentiality can be guaranteed even in situation where entanglement distillation fails. We extend our results to multiparty situations which are of special interest for secure quantum networks.Comment: 5 + 11 pages, 0 + 4 figures, A. Pirker and M. Zwerger contributed equally to this work, replaced with accepted versio

    FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

    Full text link
    We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for \textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. By leveraging a LSH style randomized indexing procedure and combining it with several principled techniques, such as reservoir sampling, recent advances in one-pass minwise hashing, and count based estimations, we reduce the computational and parallelization costs of similarity search, while retaining sound theoretical guarantees. We evaluate FLASH on several real, high-dimensional datasets from different domains, including text, malicious URL, click-through prediction, social networks, etc. Our experiments shed new light on the difficulties associated with datasets having several million dimensions. Current state-of-the-art implementations either fail on the presented scale or are orders of magnitude slower than FLASH. FLASH is capable of computing an approximate k-NN graph, from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than 10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam dataset, using brute-force (n2Dn^2D), will require at least 20 teraflops. We provide CPU and GPU implementations of FLASH for replicability of our results
    • …
    corecore