179,349 research outputs found
Efficient classification using parallel and scalable compressed model and Its application on intrusion detection
In order to achieve high efficiency of classification in intrusion detection,
a compressed model is proposed in this paper which combines horizontal
compression with vertical compression. OneR is utilized as horizontal
com-pression for attribute reduction, and affinity propagation is employed as
vertical compression to select small representative exemplars from large
training data. As to be able to computationally compress the larger volume of
training data with scalability, MapReduce based parallelization approach is
then implemented and evaluated for each step of the model compression process
abovementioned, on which common but efficient classification methods can be
directly used. Experimental application study on two publicly available
datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that the
classification using the compressed model proposed can effectively speed up the
detection procedure at up to 184 times, most importantly at the cost of a
minimal accuracy difference with less than 1% on average
Load-Balancing for Parallel Delaunay Triangulations
Computing the Delaunay triangulation (DT) of a given point set in
is one of the fundamental operations in computational geometry.
Recently, Funke and Sanders (2017) presented a divide-and-conquer DT algorithm
that merges two partial triangulations by re-triangulating a small subset of
their vertices - the border vertices - and combining the three triangulations
efficiently via parallel hash table lookups. The input point division should
therefore yield roughly equal-sized partitions for good load-balancing and also
result in a small number of border vertices for fast merging. In this paper, we
present a novel divide-step based on partitioning the triangulation of a small
sample of the input points. In experiments on synthetic and real-world data
sets, we achieve nearly perfectly balanced partitions and small border
triangulations. This almost cuts running time in half compared to
non-data-sensitive division schemes on inputs exhibiting an exploitable
underlying structure.Comment: Short version submitted to EuroPar 201
Large Scale Parallel Computations in R through Elemental
Even though in recent years the scale of statistical analysis problems has
increased tremendously, many statistical software tools are still limited to
single-node computations. However, statistical analyses are largely based on
dense linear algebra operations, which have been deeply studied, optimized and
parallelized in the high-performance-computing community. To make
high-performance distributed computations available for statistical analysis,
and thus enable large scale statistical computations, we introduce RElem, an
open source package that integrates the distributed dense linear algebra
library Elemental into R. While on the one hand, RElem provides direct wrappers
of Elemental's routines, on the other hand, it overloads various operators and
functions to provide an entirely native R experience for distributed
computations. We showcase how simple it is to port existing R programs to Relem
and demonstrate that Relem indeed allows to scale beyond the single-node
limitation of R with the full performance of Elemental without any overhead.Comment: 16 pages, 5 figure
- …