1,783 research outputs found
A Survey of Parallel Sequential Pattern Mining
With the growing popularity of shared resources, large volumes of complex
data of different types are collected automatically. Traditional data mining
algorithms generally have problems and challenges including huge memory cost,
low processing speed, and inadequate hard disk space. As a fundamental task of
data mining, sequential pattern mining (SPM) is used in a wide variety of
real-life applications. However, it is more complex and challenging than other
pattern mining tasks, i.e., frequent itemset mining and association rule
mining, and also suffers from the above challenges when handling the
large-scale data. To solve these problems, mining sequential patterns in a
parallel or distributed computing environment has emerged as an important issue
with many applications. In this paper, an in-depth survey of the current status
of parallel sequential pattern mining (PSPM) is investigated and provided,
including detailed categorization of traditional serial SPM approaches, and
state of the art parallel SPM. We review the related work of parallel
sequential pattern mining in detail, including partition-based algorithms for
PSPM, Apriori-based PSPM, pattern growth based PSPM, and hybrid algorithms for
PSPM, and provide deep description (i.e., characteristics, advantages,
disadvantages and summarization) of these parallel approaches of PSPM. Some
advanced topics for PSPM, including parallel quantitative / weighted / utility
sequential pattern mining, PSPM from uncertain data and stream data, hardware
acceleration for PSPM, are further reviewed in details. Besides, we review and
provide some well-known open-source software of PSPM. Finally, we summarize
some challenges and opportunities of PSPM in the big data era.Comment: Accepted by ACM Trans. on Knowl. Discov. Data, 33 page
Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization
An efficient algorithm for recurrent neural network training is presented.
The approach increases the training speed for tasks where a length of the input
sequence may vary significantly. The proposed approach is based on the optimal
batch bucketing by input sequence length and data parallelization on multiple
graphical processing units. The baseline training performance without sequence
bucketing is compared with the proposed solution for a different number of
buckets. An example is given for the online handwriting recognition task using
an LSTM recurrent neural network. The evaluation is performed in terms of the
wall clock time, number of epochs, and validation loss value.Comment: 4 pages, 5 figures, Comments, 2016 IEEE First International
Conference on Data Stream Mining & Processing (DSMP), Lviv, 201
Technical Report: Accelerating Dynamic Graph Analytics on GPUs
As graph analytics often involves compute-intensive operations, GPUs have
been extensively used to accelerate the processing. However, in many
applications such as social networks, cyber security, and fraud detection,
their representative graphs evolve frequently and one has to perform a rebuild
of the graph structure on GPUs to incorporate the updates. Hence, rebuilding
the graphs becomes the bottleneck of processing high-speed graph streams. In
this paper, we propose a GPU-based dynamic graph storage scheme to support
existing graph algorithms easily. Furthermore, we propose parallel update
algorithms to support efficient stream updates so that the maintained graph is
immediately available for high-speed analytic processing on GPUs. Our extensive
experiments with three streaming applications on large-scale real and synthetic
datasets demonstrate the superior performance of our proposed approach.Comment: 34 pages, 18 figure
Computational Intelligence Challenges and Applications on Large-Scale Astronomical Time Series Databases
Time-domain astronomy (TDA) is facing a paradigm shift caused by the
exponential growth of the sample size, data complexity and data generation
rates of new astronomical sky surveys. For example, the Large Synoptic Survey
Telescope (LSST), which will begin operations in northern Chile in 2022, will
generate a nearly 150 Petabyte imaging dataset of the southern hemisphere sky.
The LSST will stream data at rates of 2 Terabytes per hour, effectively
capturing an unprecedented movie of the sky. The LSST is expected not only to
improve our understanding of time-varying astrophysical objects, but also to
reveal a plethora of yet unknown faint and fast-varying phenomena. To cope with
a change of paradigm to data-driven astronomy, the fields of astroinformatics
and astrostatistics have been created recently. The new data-oriented paradigms
for astronomy combine statistics, data mining, knowledge discovery, machine
learning and computational intelligence, in order to provide the automated and
robust methods needed for the rapid detection and classification of known
astrophysical objects as well as the unsupervised characterization of novel
phenomena. In this article we present an overview of machine learning and
computational intelligence applications to TDA. Future big data challenges and
new lines of research in TDA, focusing on the LSST, are identified and
discussed from the viewpoint of computational intelligence/machine learning.
Interdisciplinary collaboration will be required to cope with the challenges
posed by the deluge of astronomical data coming from the LSST
Fine-Grained Land Use Classification at the City Scale Using Ground-Level Images
We perform fine-grained land use mapping at the city scale using ground-level
images. Mapping land use is considerably more difficult than mapping land cover
and is generally not possible using overhead imagery as it requires close-up
views and seeing inside buildings. We postulate that the growing collections of
georeferenced, ground-level images suggest an alternate approach to this
geographic knowledge discovery problem. We develop a general framework that
uses Flickr images to map 45 different land-use classes for the City of San
Francisco. Individual images are classified using a novel convolutional neural
network containing two streams, one for recognizing objects and another for
recognizing scenes. This network is trained in an end-to-end manner directly on
the labeled training images. We propose several strategies to overcome the
noisiness of our user-generated data including search-based training set
augmentation and online adaptive training. We derive a ground truth map of San
Francisco in order to evaluate our method. We demonstrate the effectiveness of
our approach through geo-visualization and quantitative analysis. Our framework
achieves over 29% recall at the individual land parcel level which represents a
strong baseline for the challenging 45-way land use classification problem
especially given the noisiness of the image data
FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search
We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for
\textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search
system for ultra-high dimensional datasets on a single machine, that does not
require similarity computations and is tailored for high-performance computing
platforms. By leveraging a LSH style randomized indexing procedure and
combining it with several principled techniques, such as reservoir sampling,
recent advances in one-pass minwise hashing, and count based estimations, we
reduce the computational and parallelization costs of similarity search, while
retaining sound theoretical guarantees.
We evaluate FLASH on several real, high-dimensional datasets from different
domains, including text, malicious URL, click-through prediction, social
networks, etc. Our experiments shed new light on the difficulties associated
with datasets having several million dimensions. Current state-of-the-art
implementations either fail on the presented scale or are orders of magnitude
slower than FLASH. FLASH is capable of computing an approximate k-NN graph,
from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than
10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam
dataset, using brute-force (), will require at least 20 teraflops. We
provide CPU and GPU implementations of FLASH for replicability of our results
GPU Accelerated Similarity Self-Join for Multi-Dimensional Data
The self-join finds all objects in a dataset that are within a search
distance, epsilon, of each other; therefore, the self-join is a building block
of many algorithms. We advance a GPU-accelerated self-join algorithm targeted
towards high dimensional data. The massive parallelism afforded by the GPU and
high aggregate memory bandwidth makes the architecture well-suited for
data-intensive workloads. We leverage a grid-based, GPU-tailored index to
perform range queries. We propose the following optimizations: (i) a trade-off
between candidate set filtering and index search overhead by exploiting
properties of the index; (ii) reordering the data based on variance in each
dimension to improve the filtering power of the index; and (iii) a pruning
method for reducing the number of expensive distance calculations. Across most
scenarios on real-world and synthetic datasets, our algorithm outperforms the
parallel state-of-the-art approach. Exascale systems are converging on
heterogeneous distributed-memory architectures. We show that an entity
partitioning method can be utilized to achieve a balanced workload, and thus
good scalability for multi-GPU or distributed-memory self-joins
Graphics Processing Units and High-Dimensional Optimization
This paper discusses the potential of graphics processing units (GPUs) in
high-dimensional optimization problems. A single GPU card with hundreds of
arithmetic cores can be inserted in a personal computer and dramatically
accelerates many statistical algorithms. To exploit these devices fully,
optimization algorithms should reduce to multiple parallel tasks, each
accessing a limited amount of data. These criteria favor EM and MM algorithms
that separate parameters and data. To a lesser extent block relaxation and
coordinate descent and ascent also qualify. We demonstrate the utility of GPUs
in nonnegative matrix factorization, PET image reconstruction, and
multidimensional scaling. Speedups of 100 fold can easily be attained. Over the
next decade, GPUs will fundamentally alter the landscape of computational
statistics. It is time for more statisticians to get on-board
A Data as a Service (DaaS) Model for GPU-based Data Analytics
Cloud-based services with resources to be provisioned for consumers are
increasingly the norm, especially with respect to Big data, spatiotemporal data
mining and application services that impose a user's agreed Quality of Service
(QoS) rules or Service Level Agreement (SLA). Considering the pervasive nature
of data centers and cloud system, there is a need for a real-time analytics of
the systems considering cost, utility and energy. This work presents an overlay
model of GPU system for Data As A Service (DaaS) to give a real-time data
analysis of network data, customers, investors and users' data from the
datacenters or cloud system. Using a modeled layer to define a learning
protocol and system, we give a custom, profitable system for DaaS on GPU. The
GPU-enabled pre-processing and initial operations of the clustering model
analysis is promising as shown in the results. We examine the model on
real-world data sets to model a big data set or spatiotemporal data mining
services. We also produce results of our model with clustering, neural
networks' Self-organizing feature maps (SOFM or SOM) to produce a distribution
of the clustering for DaaS model. The experimental results thus far show a
promising model that could enhance SLA and or QoS based DaaS.Comment: Accepted, 23 December 2017, by the IEEE IFIP NTMS Workshop on Big
Data and Emerging Trends WBD-ET 2018; it was later withdrawn because of
funding issues. An extended/enhanced version will be published in future
dates in related journal
GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding
Learning continuous representations of nodes is attracting growing interest
in both academia and industry recently, due to their simplicity and
effectiveness in a variety of applications. Most of existing node embedding
algorithms and systems are capable of processing networks with hundreds of
thousands or a few millions of nodes. However, how to scale them to networks
that have tens of millions or even hundreds of millions of nodes remains a
challenging problem. In this paper, we propose GraphVite, a high-performance
CPU-GPU hybrid system for training node embeddings, by co-optimizing the
algorithm and the system. On the CPU end, augmented edge samples are parallelly
generated by random walks in an online fashion on the network, and serve as the
training data. On the GPU end, a novel parallel negative sampling is proposed
to leverage multiple GPUs to train node embeddings simultaneously, without much
data transfer and synchronization. Moreover, an efficient collaboration
strategy is proposed to further reduce the synchronization cost between CPUs
and GPUs. Experiments on multiple real-world networks show that GraphVite is
super efficient. It takes only about one minute for a network with 1 million
nodes and 5 million edges on a single machine with 4 GPUs, and takes around 20
hours for a network with 66 million nodes and 1.8 billion edges. Compared to
the current fastest system, GraphVite is about 50 times faster without any
sacrifice on performance.Comment: accepted at WWW 201
- …