289 research outputs found
Data-Oblivious Graph Algorithms in Outsourced External Memory
Motivated by privacy preservation for outsourced data, data-oblivious
external memory is a computational framework where a client performs
computations on data stored at a semi-trusted server in a way that does not
reveal her data to the server. This approach facilitates collaboration and
reliability over traditional frameworks, and it provides privacy protection,
even though the server has full access to the data and he can monitor how it is
accessed by the client. The challenge is that even if data is encrypted, the
server can learn information based on the client data access pattern; hence,
access patterns must also be obfuscated. We investigate privacy-preserving
algorithms for outsourced external memory that are based on the use of
data-oblivious algorithms, that is, algorithms where each possible sequence of
data accesses is independent of the data values. We give new efficient
data-oblivious algorithms in the outsourced external memory model for a number
of fundamental graph problems. Our results include new data-oblivious
external-memory methods for constructing minimum spanning trees, performing
various traversals on rooted trees, answering least common ancestor queries on
trees, computing biconnected components, and forming open ear decompositions.
None of our algorithms make use of constant-time random oracles.Comment: 20 page
Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms
There has been significant progress in understanding the parallelism inherent
to iterative sequential algorithms: for many classic algorithms, the depth of
the dependence structure is now well understood, and scheduling techniques have
been developed to exploit this shallow dependence structure for efficient
parallel implementations. A related, applied research strand has studied
methods by which certain iterative task-based algorithms can be efficiently
parallelized via relaxed concurrent priority schedulers. These allow for high
concurrency when inserting and removing tasks, at the cost of executing
superfluous work due to the relaxed semantics of the scheduler.
In this work, we take a step towards unifying these two research directions,
by showing that there exists a family of relaxed priority schedulers that can
efficiently and deterministically execute classic iterative algorithms such as
greedy maximal independent set (MIS) and matching. Our primary result shows
that, given a randomized scheduler with an expected relaxation factor of in
terms of the maximum allowed priority inversions on a task, and any graph on
vertices, the scheduler is able to execute greedy MIS with only an additive
factor of poly() expected additional iterations compared to an exact (but
not scalable) scheduler. This counter-intuitive result demonstrates that the
overhead of relaxation when computing MIS is not dependent on the input size or
structure of the input graph. Experimental results show that this overhead can
be clearly offset by the gain in performance due to the highly scalable
scheduler. In sum, we present an efficient method to deterministically
parallelize iterative sequential algorithms, with provable runtime guarantees
in terms of the number of executed tasks to completion.Comment: PODC 2018, pages 377-386 in proceeding
Parallel Metric Tree Embedding based on an Algebraic View on Moore-Bellman-Ford
A \emph{metric tree embedding} of expected \emph{stretch~}
maps a weighted -node graph to a weighted tree with such that, for all ,
and
. Such embeddings are highly useful for designing
fast approximation algorithms, as many hard problems are easy to solve on tree
instances. However, to date the best parallel -depth algorithm that achieves an asymptotically optimal expected stretch of
requires
work and a metric as input.
In this paper, we show how to achieve the same guarantees using
depth and
work, where and is an arbitrarily small constant.
Moreover, one may further reduce the work to at the expense of increasing the expected stretch to
.
Our main tool in deriving these parallel algorithms is an algebraic
characterization of a generalization of the classic Moore-Bellman-Ford
algorithm. We consider this framework, which subsumes a variety of previous
"Moore-Bellman-Ford-like" algorithms, to be of independent interest and discuss
it in depth. In our tree embedding algorithm, we leverage it for providing
efficient query access to an approximate metric that allows sampling the tree
using depth and work.
We illustrate the generality and versatility of our techniques by various
examples and a number of additional results
Reordering Rows for Better Compression: Beyond the Lexicographic Order
Sorting database tables before compressing them improves the compression
rate. Can we do better than the lexicographical order? For minimizing the
number of runs in a run-length encoding compression scheme, the best approaches
to row-ordering are derived from traveling salesman heuristics, although there
is a significant trade-off between running time and compression. A new
heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades
off compression for a major running-time speedup, is a good option for very
large tables. However, for some compression schemes, it is more important to
generate long runs rather than few runs. For this case, another novel
heuristic, Vortex, is promising. We find that we can improve run-length
encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%:
these gains are on top of the gains due to lexicographically sorting the table.
We prove that the new row reordering is optimal (within 10%) at minimizing the
runs of identical values within columns, in a few cases.Comment: to appear in ACM TOD
Engineering Massively Parallel MST Algorithms
We develop and extensively evaluate highly scalable distributed-memory
algorithms for computing minimum spanning trees (MSTs). At the heart of our
solutions is a scalable variant of Boruvka's algorithm. For partitioned graphs
with many local edges, we improve this with an effective form of contracting
local parts of the graph during a preprocessing step. We also adapt the
filtering concept of the best practical sequential algorithm to develop a
massively parallel Filter-Boruvka algorithm that is very useful for graphs with
poor locality and high average degree. Our experiments indicate that our
algorithms scale well up to at least 65 536 cores and are up to 800 times
faster than previous distributed MST algorithms.Comment: 12 pages, 6 figure
Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs
Depth first search (DFS) tree is a fundamental data structure for solving
graph problems. The classical algorithm [SiComp74] for building a DFS tree
requires time for a given graph having vertices and edges.
Recently, Baswana et al. [SODA16] presented a simple algorithm for updating DFS
tree of an undirected graph after an edge/vertex update in time.
However, their algorithm is strictly sequential. We present an algorithm
achieving similar bounds, that can be adopted easily to the parallel
environment.
In the parallel model, a DFS tree can be computed from scratch using
processors in expected time [SiComp90] on an EREW PRAM, whereas
the best deterministic algorithm takes time
[SiComp90,JAlg93] on a CRCW PRAM. Our algorithm can be used to develop optimal
(upto polylog n factors deterministic algorithms for maintaining fully dynamic
DFS and fault tolerant DFS, of an undirected graph.
1- Parallel Fully Dynamic DFS:
Given an arbitrary online sequence of vertex/edge updates, we can maintain a
DFS tree of an undirected graph in time per update using
processors on an EREW PRAM.
2- Parallel Fault tolerant DFS:
An undirected graph can be preprocessed to build a data structure of size
O(m) such that for a set of updates (where is constant) in the graph,
the updated DFS tree can be computed in time using
processors on an EREW PRAM.
Moreover, our fully dynamic DFS algorithm provides, in a seamless manner,
nearly optimal (upto polylog n factors) algorithms for maintaining a DFS tree
in semi-streaming model and a restricted distributed model. These are the first
parallel, semi-streaming and distributed algorithms for maintaining a DFS tree
in the dynamic setting.Comment: Accepted to appear in SPAA'17, 32 Pages, 5 Figure
Motion planning for geometric models in data visualization
Interaktivní geometrické modely pro simulaci přírodních jevů (LH11006)Pokročilé grafické a počítačové systémy (SGS-2016-013)A finding of path is an important task in many research areas and it is
a common problem solved in a wide range of applications. New problems of
finding path appear and complex problems persist, such as a real-time plan-
ning of paths for huge crowds in dynamic environments, where the properties
according to which the cost of a path is evaluated as well as the topology
of paths may change. The task of finding a path can be divided into path
planning and motion planning, which implicitly respects the collision with
surroundings in the environment.
Within the first group this thesis focuses on path planning on graphs for
crowds. The main idea is to group members of the crowd by their common
initial and target positions and then plan the path for one representative
member of each group. These representative members can be navigated by
classic approaches and the rest of the group will follow them. If the crowd can
be divided into a few groups this way, the proposed approach will save a huge
amount of computational and memory demands in dynamic environments.
In the second area, motion planning, we are dealing with another problem.
The task is to navigate the ligand through the protein or into the protein,
which turns out to be a challenging problem because it needs to be solved in
3D with the collision detection
Graph set data mining
Graphs are among the most versatile abstract data types in computer science. With the variety comes great adoption in various application fields, such as chemistry, biology, social analysis, logistics, and computer science itself. With the growing capacities of digital storage, the collection of large amounts of data has become the norm in many application fields. Data mining, i.e., the automated extraction of non-trivial patterns from data, is a key step to extract knowledge from these datasets and generate value. This thesis is dedicated to concurrent scalable data mining algorithms beyond traditional notions of efficiency for large-scale datasets of small labeled graphs; more precisely, structural clustering and representative subgraph pattern mining. It is motivated by, but not limited to, the need to analyze molecular libraries of ever-increasing size in the drug discovery process. Structural clustering makes use of graph theoretical concepts, such as (common) subgraph isomorphisms and frequent subgraphs, to model cluster commonalities directly in the application domain. It is considered computationally demanding for non-restricted graph classes and with very few exceptions prior algorithms are only suitable for very small datasets. This thesis discusses the first truly scalable structural clustering algorithm StruClus with linear worst-case complexity. At the same time, StruClus embraces the inherent values of structural clustering algorithms, i.e., interpretable, consistent, and high-quality results. A novel two-fold sampling strategy with stochastic error bounds for frequent subgraph mining is presented. It enables fast extraction of cluster commonalities in the form of common subgraph representative sets. StruClus is the first structural clustering algorithm with a directed selection of structural cluster-representative patterns regarding homogeneity and separation aspects in the high-dimensional subgraph pattern space. Furthermore, a novel concept of cluster homogeneity balancing using dynamically-sized representatives is discussed. The second part of this thesis discusses the representative subgraph pattern mining problem in more general terms. A novel objective function maximizes the number of represented graphs for a cardinality-constrained representative set. It is shown that the problem is a special case of the maximum coverage problem and is NP-hard. Based on the greedy approximation of Nemhauser, Wolsey, and Fisher for submodular set function maximization a novel sampling approach is presented. It mines candidate sets that contain an optimal greedy solution with a probabilistic maximum error. This leads to a constant-time algorithm to generate the candidate sets given a fixed-size sample of the dataset. In combination with a cheap single-pass streaming evaluation of the candidate sets, this enables scalability to datasets with billions of molecules on a single machine. Ultimately, the sampling approach leads to the first distributed subgraph pattern mining algorithm that distributes the pattern space and the dataset graphs at the same time
- …