1,808 research outputs found
Frequent itemset mining on multiprocessor systems
Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism.
In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined.
For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets
CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores
Distributed key-value stores provide scalable, fault-tolerant, and self-organizing
storage services, but fall short of guaranteeing linearizable consistency
in partially synchronous, lossy, partitionable, and dynamic networks, when data
is distributed and replicated automatically by the principle of consistent hashing.
This paper introduces consistent quorums as a solution for achieving atomic
consistency. We present the design and implementation of CATS, a distributed
key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is
scalable, elastic, and self-organizing; key properties for modern cloud storage
middleware. Our system shows that consistency can be achieved with practical
performance and modest throughput overhead (5%) for read-intensive workloads
CONFLLVM: A Compiler for Enforcing Data Confidentiality in Low-Level Code
We present an instrumenting compiler for enforcing data confidentiality in
low-level applications (e.g. those written in C) in the presence of an active
adversary. In our approach, the programmer marks secret data by writing
lightweight annotations on top-level definitions in the source code. The
compiler then uses a static flow analysis coupled with efficient runtime
instrumentation, a custom memory layout, and custom control-flow integrity
checks to prevent data leaks even in the presence of low-level attacks. We have
implemented our scheme as part of the LLVM compiler. We evaluate it on the SPEC
micro-benchmarks for performance, and on larger, real-world applications
(including OpenLDAP, which is around 300KLoC) for programmer overhead required
to restructure the application when protecting the sensitive data such as
passwords. We find that performance overheads introduced by our instrumentation
are moderate (average 12% on SPEC), and the programmer effort to port OpenLDAP
is only about 160 LoC.Comment: Technical report for CONFLLVM: A Compiler for Enforcing Data
Confidentiality in Low-Level Code, appearing at EuroSys 201
DISPATCH: A Numerical Simulation Framework for the Exa-scale Era. I. Fundamentals
We introduce a high-performance simulation framework that permits the
semi-independent, task-based solution of sets of partial differential
equations, typically manifesting as updates to a collection of `patches' in
space-time. A hybrid MPI/OpenMP execution model is adopted, where work tasks
are controlled by a rank-local `dispatcher' which selects, from a set of tasks
generally much larger than the number of physical cores (or hardware threads),
tasks that are ready for updating. The definition of a task can vary, for
example, with some solving the equations of ideal magnetohydrodynamics (MHD),
others non-ideal MHD, radiative transfer, or particle motion, and yet others
applying particle-in-cell (PIC) methods. Tasks do not have to be grid-based,
while tasks that are, may use either Cartesian or orthogonal curvilinear
meshes. Patches may be stationary or moving. Mesh refinement can be static or
dynamic. A feature of decisive importance for the overall performance of the
framework is that time steps are determined and applied locally; this allows
potentially large reductions in the total number of updates required in cases
when the signal speed varies greatly across the computational domain, and
therefore a corresponding reduction in computing time. Another feature is a
load balancing algorithm that operates `locally' and aims to simultaneously
minimise load and communication imbalance. The framework generally relies on
already existing solvers, whose performance is augmented when run under the
framework, due to more efficient cache usage, vectorisation, local
time-stepping, plus near-linear and, in principle, unlimited OpenMP and MPI
scaling.Comment: 17 pages, 8 figures. Accepted by MNRA
An Instrumenting Compiler for Enforcing Confidentiality in Low-Level Code
We present an instrumenting compiler for enforcing data confidentiality in low-level applications (e.g. those written in C) in the presence of an active adversary. In our approach, the programmer marks secret data by writing lightweight annotations on top-level definitions in the source code. The compiler then uses a static flow analysis coupled with efficient runtime instrumentation, a custom memory layout, and custom control-flow integrity checks to prevent data leaks even in the presence of low-level attacks. We have implemented our scheme as part of the LLVM compiler. We evaluate it on the SPEC micro-benchmarks for performance, and on larger, real-world applications (including OpenLDAP, which is around 300KLoC) for programmer overhead required to restructure the application when protecting the sensitive data such as passwords. We find that performance overheads introduced by our instrumentation are moderate (average 12% on SPEC), and the programmer effort to port OpenLDAP is only about 160 LoC
- …