4,461 research outputs found
Using Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach
While it is well-known and acknowledged that the performance of graph
algorithms is heavily dependent on the input data, there has been surprisingly
little research to quantify and predict the impact the graph structure has on
performance. Parallel graph algorithms, running on many-core systems such as
GPUs, are no exception: most research has focused on how to efficiently
implement and tune different graph operations on a specific GPU. However, the
performance impact of the input graph has only been taken into account
indirectly as a result of the graphs used to benchmark the system.
In this work, we present a case study investigating how to use the properties
of the input graph to improve the performance of the breadth-first search (BFS)
graph traversal. To do so, we first study the performance variation of 15
different BFS implementations across 248 graphs. Using this performance data,
we show that significant speed-up can be achieved by combining the best
implementation for each level of the traversal. To make use of this
data-dependent optimization, we must correctly predict the relative performance
of algorithms per graph level, and enable dynamic switching to the optimal
algorithm for each level at runtime.
We use the collected performance data to train a binary decision tree, to
enable high-accuracy predictions and fast switching. We demonstrate empirically
that our decision tree is both fast enough to allow dynamic switching between
implementations, without noticeable overhead, and accurate enough in its
prediction to enable significant BFS speedup. We conclude that our model-driven
approach (1) enables BFS to outperform state of the art GPU algorithms, and (2)
can be adapted for other BFS variants, other algorithms, or more specific
datasets
HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks
The unsupervised detection of anomalies in time series data has important
applications in user behavioral modeling, fraud detection, and cybersecurity.
Anomaly detection has, in fact, been extensively studied in categorical
sequences. However, we often have access to time series data that represent
paths through networks. Examples include transaction sequences in financial
networks, click streams of users in networks of cross-referenced documents, or
travel itineraries in transportation networks. To reliably detect anomalies, we
must account for the fact that such data contain a large number of independent
observations of paths constrained by a graph topology. Moreover, the
heterogeneity of real systems rules out frequency-based anomaly detection
techniques, which do not account for highly skewed edge and degree statistics.
To address this problem, we introduce HYPA, a novel framework for the
unsupervised detection of anomalies in large corpora of variable-length
temporal paths in a graph. HYPA provides an efficient analytical method to
detect paths with anomalous frequencies that result from nodes being traversed
in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM
Data Mining (SDM 2020
CAPre: Code-Analysis based Prefetching for Persistent object stores
Data prefetching aims to improve access times to data storage systems by predicting data records that are likely to be accessed by subsequent requests and retrieving them into a memory cache before they are needed. In the case of Persistent Object Stores, previous approaches to prefetching have been based on predictions made through analysis of the store’s schema, which generates rigid predictions, or monitoring access patterns to the store while applications are executed, which introduces memory and/or computation overhead.
In this paper, we present CAPre, a novel prefetching system for Persistent Object Stores based on static code analysis of object-oriented applications. CAPre generates the predictions at compile-time and does not introduce any overhead to the application execution. Moreover, CAPre is able to predict large amounts of objects that will be accessed in the near future, thus enabling the object store to perform parallel prefetching if the objects are distributed, in a much more aggressive way than in schema-based prediction algorithms. We integrate CAPre into a distributed Persistent Object Store and run a series of experiments that show that it can reduce the execution time of applications from 9% to over 50%, depending on the nature of the application and its persistent data model.This work has been supported by the European Union’s Horizon 2020 research and innovation program under the BigStorage European Training Network (ETN) (grant H2020-MSCA-ITN-2014- 642963), the Spanish Ministry of Science and Innovation (contract TIN2015-65316) and the Generalitat de Catalunya, Spain (contract 2014-SGR-1051).Peer ReviewedPostprint (author's final draft
Towards Next Generation Business Process Model Repositories – A Technical Perspective on Loading and Processing of Process Models
Business process management repositories manage large collections of process models ranging in the thousands. Additionally, they provide management functions like e.g. mining, querying, merging and variants management for process models. However, most current business process management repositories are built on top of relation database management systems (RDBMS) although this leads to performance issues. These issues result from the relational algebra, the mismatch between relational tables and object oriented programming (impedance mismatch) as well as new technological developments in the last 30 years as e.g. more and cheap disk and memory space, clusters and clouds. The goal of this paper is to present current paradigms to overcome the performance problems inherent in RDBMS. Therefore, we have to fuse research about data modeling along database technologies as well as algorithm design and parallelization for the technology paradigms occurring nowadays. Based on these research streams we have shown how the performance of business process management repositories could be improved in terms of loading performance of processes (from e.g. a disk) and the computation of management techniques resulting in even faster application of such a technique. Exemplarily, applications of the compiled paradigms are presented to show their applicability
- …