1,808 research outputs found

    Static dependency analysis of recursive structures for parallelisation

    Get PDF

    Frequent itemset mining on multiprocessor systems

    Get PDF
    Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets

    CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores

    Get PDF
    Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing. This paper introduces consistent quorums as a solution for achieving atomic consistency. We present the design and implementation of CATS, a distributed key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware. Our system shows that consistency can be achieved with practical performance and modest throughput overhead (5%) for read-intensive workloads

    CONFLLVM: A Compiler for Enforcing Data Confidentiality in Low-Level Code

    Full text link
    We present an instrumenting compiler for enforcing data confidentiality in low-level applications (e.g. those written in C) in the presence of an active adversary. In our approach, the programmer marks secret data by writing lightweight annotations on top-level definitions in the source code. The compiler then uses a static flow analysis coupled with efficient runtime instrumentation, a custom memory layout, and custom control-flow integrity checks to prevent data leaks even in the presence of low-level attacks. We have implemented our scheme as part of the LLVM compiler. We evaluate it on the SPEC micro-benchmarks for performance, and on larger, real-world applications (including OpenLDAP, which is around 300KLoC) for programmer overhead required to restructure the application when protecting the sensitive data such as passwords. We find that performance overheads introduced by our instrumentation are moderate (average 12% on SPEC), and the programmer effort to port OpenLDAP is only about 160 LoC.Comment: Technical report for CONFLLVM: A Compiler for Enforcing Data Confidentiality in Low-Level Code, appearing at EuroSys 201

    DISPATCH: A Numerical Simulation Framework for the Exa-scale Era. I. Fundamentals

    Full text link
    We introduce a high-performance simulation framework that permits the semi-independent, task-based solution of sets of partial differential equations, typically manifesting as updates to a collection of `patches' in space-time. A hybrid MPI/OpenMP execution model is adopted, where work tasks are controlled by a rank-local `dispatcher' which selects, from a set of tasks generally much larger than the number of physical cores (or hardware threads), tasks that are ready for updating. The definition of a task can vary, for example, with some solving the equations of ideal magnetohydrodynamics (MHD), others non-ideal MHD, radiative transfer, or particle motion, and yet others applying particle-in-cell (PIC) methods. Tasks do not have to be grid-based, while tasks that are, may use either Cartesian or orthogonal curvilinear meshes. Patches may be stationary or moving. Mesh refinement can be static or dynamic. A feature of decisive importance for the overall performance of the framework is that time steps are determined and applied locally; this allows potentially large reductions in the total number of updates required in cases when the signal speed varies greatly across the computational domain, and therefore a corresponding reduction in computing time. Another feature is a load balancing algorithm that operates `locally' and aims to simultaneously minimise load and communication imbalance. The framework generally relies on already existing solvers, whose performance is augmented when run under the framework, due to more efficient cache usage, vectorisation, local time-stepping, plus near-linear and, in principle, unlimited OpenMP and MPI scaling.Comment: 17 pages, 8 figures. Accepted by MNRA

    An Instrumenting Compiler for Enforcing Confidentiality in Low-Level Code

    No full text
    We present an instrumenting compiler for enforcing data confidentiality in low-level applications (e.g. those written in C) in the presence of an active adversary. In our approach, the programmer marks secret data by writing lightweight annotations on top-level definitions in the source code. The compiler then uses a static flow analysis coupled with efficient runtime instrumentation, a custom memory layout, and custom control-flow integrity checks to prevent data leaks even in the presence of low-level attacks. We have implemented our scheme as part of the LLVM compiler. We evaluate it on the SPEC micro-benchmarks for performance, and on larger, real-world applications (including OpenLDAP, which is around 300KLoC) for programmer overhead required to restructure the application when protecting the sensitive data such as passwords. We find that performance overheads introduced by our instrumentation are moderate (average 12% on SPEC), and the programmer effort to port OpenLDAP is only about 160 LoC
    • …
    corecore