Search CORE

289 research outputs found

Data-Oblivious Graph Algorithms in Outsourced External Memory

Author: B Schieber
C Gentry
I Damgård
M Mareš
MT Goodrich
MT Goodrich
O Goldreich
RE Tarjan
U Vishkin
Y Maon
Publication venue
Publication date: 01/09/2014
Field of study

Motivated by privacy preservation for outsourced data, data-oblivious external memory is a computational framework where a client performs computations on data stored at a semi-trusted server in a way that does not reveal her data to the server. This approach facilitates collaboration and reliability over traditional frameworks, and it provides privacy protection, even though the server has full access to the data and he can monitor how it is accessed by the client. The challenge is that even if data is encrypted, the server can learn information based on the client data access pattern; hence, access patterns must also be obfuscated. We investigate privacy-preserving algorithms for outsourced external memory that are based on the use of data-oblivious algorithms, that is, algorithms where each possible sequence of data accesses is independent of the data values. We give new efficient data-oblivious algorithms in the outsourced external memory model for a number of fundamental graph problems. Our results include new data-oblivious external-memory methods for constructing minimum spanning trees, performing various traversals on rooted trees, answering least common ancestor queries on trees, computing biconnected components, and forming open ear decompositions. None of our algorithms make use of constant-time random oracles.Comment: 20 page

arXiv.org e-Print Archive

Crossref

Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms

Author: Alistarh Dan
Brown Trevor
Kopinsky Justin
Nadiradze Giorgi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

There has been significant progress in understanding the parallelism inherent to iterative sequential algorithms: for many classic algorithms, the depth of the dependence structure is now well understood, and scheduling techniques have been developed to exploit this shallow dependence structure for efficient parallel implementations. A related, applied research strand has studied methods by which certain iterative task-based algorithms can be efficiently parallelized via relaxed concurrent priority schedulers. These allow for high concurrency when inserting and removing tasks, at the cost of executing superfluous work due to the relaxed semantics of the scheduler. In this work, we take a step towards unifying these two research directions, by showing that there exists a family of relaxed priority schedulers that can efficiently and deterministically execute classic iterative algorithms such as greedy maximal independent set (MIS) and matching. Our primary result shows that, given a randomized scheduler with an expected relaxation factor of

k

in terms of the maximum allowed priority inversions on a task, and any graph on

n

vertices, the scheduler is able to execute greedy MIS with only an additive factor of poly(

k

) expected additional iterations compared to an exact (but not scalable) scheduler. This counter-intuitive result demonstrates that the overhead of relaxation when computing MIS is not dependent on the input size or structure of the input graph. Experimental results show that this overhead can be clearly offset by the gain in performance due to the highly scalable scheduler. In sum, we present an efficient method to deterministically parallelize iterative sequential algorithms, with provable runtime guarantees in terms of the number of executed tasks to completion.Comment: PODC 2018, pages 377-386 in proceeding

arXiv.org e-Print Archive

Crossref

IST Austria: PubRep (Institute of Science and Technology)

Parallel Metric Tree Embedding based on an Algebraic View on Moore-Bellman-Ford

Author: Cormen T. H.
Mendel M.
Mohri M.
Moore E. F.
Peleg D.
Publication venue
Publication date: 01/01/2015
Field of study

A \emph{metric tree embedding} of expected \emph{stretch~

\alpha \geq 1

} maps a weighted

n

-node graph

G = (V, E, \omega)

to a weighted tree

T = (V_T, E_T, \omega_T)

with

V \subseteq V_T

such that, for all

v,w \in V

\operatorname{dist}(v, w, G) \leq \operatorname{dist}(v, w, T)

and

operatorname{E}[\operatorname{dist}(v, w, T)] \leq \alpha \operatorname{dist}(v, w, G)

. Such embeddings are highly useful for designing fast approximation algorithms, as many hard problems are easy to solve on tree instances. However, to date the best parallel

(\operatorname{polylog} n)

-depth algorithm that achieves an asymptotically optimal expected stretch of

\alpha \in \operatorname{O}(\log n)

requires

\operatorname{\Omega}(n^2)

work and a metric as input. In this paper, we show how to achieve the same guarantees using

\operatorname{polylog} n

depth and

\operatorname{\tilde{O}}(m^{1+\epsilon})

work, where

m = |E|

and

\epsilon > 0

is an arbitrarily small constant. Moreover, one may further reduce the work to

\operatorname{\tilde{O}}(m + n^{1+\epsilon})

at the expense of increasing the expected stretch to

\operatorname{O}(\epsilon^{-1} \log n)

. Our main tool in deriving these parallel algorithms is an algebraic characterization of a generalization of the classic Moore-Bellman-Ford algorithm. We consider this framework, which subsumes a variety of previous "Moore-Bellman-Ford-like" algorithms, to be of independent interest and discuss it in depth. In our tree embedding algorithm, we leverage it for providing efficient query access to an approximate metric that allows sampling the tree using

\operatorname{polylog} n

depth and

\operatorname{\tilde{O}}(m)

work. We illustrate the generality and versatility of our techniques by various examples and a number of additional results

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Reordering Rows for Better Compression: Beyond the Lexicographic Order

Author: Gutarra Eduardo
Kaser Owen
Lemire Daniel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2012
Field of study

Sorting database tables before compressing them improves the compression rate. Can we do better than the lexicographical order? For minimizing the number of runs in a run-length encoding compression scheme, the best approaches to row-ordering are derived from traveling salesman heuristics, although there is a significant trade-off between running time and compression. A new heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades off compression for a major running-time speedup, is a good option for very large tables. However, for some compression schemes, it is more important to generate long runs rather than few runs. For this case, another novel heuristic, Vortex, is promising. We find that we can improve run-length encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%: these gains are on top of the gains due to lexicographically sorting the table. We prove that the new row reordering is optimal (within 10%) at minimizing the runs of identical values within columns, in a few cases.Comment: to appear in ACM TOD

arXiv.org e-Print Archive

R-libre

Crossref

Engineering Massively Parallel MST Algorithms

Author: Sanders Peter
Schimek Matthias
Publication venue
Publication date: 08/12/2023
Field of study

We develop and extensively evaluate highly scalable distributed-memory algorithms for computing minimum spanning trees (MSTs). At the heart of our solutions is a scalable variant of Boruvka's algorithm. For partitioned graphs with many local edges, we improve this with an effective form of contracting local parts of the graph during a preprocessing step. We also adapt the filtering concept of the best practical sequential algorithm to develop a massively parallel Filter-Boruvka algorithm that is very useful for graphs with poor locality and high average degree. Our experiments indicate that our algorithms scale well up to at least 65 536 cores and are up to 800 times faster than previous distributed MST algorithms.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive

Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs

Author: Khan Shahbaz
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2017
Field of study

Depth first search (DFS) tree is a fundamental data structure for solving graph problems. The classical algorithm [SiComp74] for building a DFS tree requires

O(m+n)

time for a given graph

G

having

n

vertices and

m

edges. Recently, Baswana et al. [SODA16] presented a simple algorithm for updating DFS tree of an undirected graph after an edge/vertex update in

\tilde{O}(n)

time. However, their algorithm is strictly sequential. We present an algorithm achieving similar bounds, that can be adopted easily to the parallel environment. In the parallel model, a DFS tree can be computed from scratch using

m

processors in expected

\tilde{O}(1)

time [SiComp90] on an EREW PRAM, whereas the best deterministic algorithm takes

\tilde{O}(\sqrt{n})

time [SiComp90,JAlg93] on a CRCW PRAM. Our algorithm can be used to develop optimal (upto polylog n factors deterministic algorithms for maintaining fully dynamic DFS and fault tolerant DFS, of an undirected graph. 1- Parallel Fully Dynamic DFS: Given an arbitrary online sequence of vertex/edge updates, we can maintain a DFS tree of an undirected graph in

\tilde{O}(1)

time per update using

m

processors on an EREW PRAM. 2- Parallel Fault tolerant DFS: An undirected graph can be preprocessed to build a data structure of size O(m) such that for a set of

k

updates (where

k

is constant) in the graph, the updated DFS tree can be computed in

\tilde{O}(1)

time using

n

processors on an EREW PRAM. Moreover, our fully dynamic DFS algorithm provides, in a seamless manner, nearly optimal (upto polylog n factors) algorithms for maintaining a DFS tree in semi-streaming model and a restricted distributed model. These are the first parallel, semi-streaming and distributed algorithms for maintaining a DFS tree in the dynamic setting.Comment: Accepted to appear in SPAA'17, 32 Pages, 5 Figure

arXiv.org e-Print Archive

Crossref

Motion planning for geometric models in data visualization

Author: Szkandera Jakub
Publication venue: Západočeská univerzita v Plzni
Publication date: 01/01/2019
Field of study

Interaktivní geometrické modely pro simulaci přírodních jevů (LH11006)Pokročilé grafické a počítačové systémy (SGS-2016-013)A finding of path is an important task in many research areas and it is a common problem solved in a wide range of applications. New problems of finding path appear and complex problems persist, such as a real-time plan- ning of paths for huge crowds in dynamic environments, where the properties according to which the cost of a path is evaluated as well as the topology of paths may change. The task of finding a path can be divided into path planning and motion planning, which implicitly respects the collision with surroundings in the environment. Within the first group this thesis focuses on path planning on graphs for crowds. The main idea is to group members of the crowd by their common initial and target positions and then plan the path for one representative member of each group. These representative members can be navigated by classic approaches and the rest of the group will follow them. If the crowd can be divided into a few groups this way, the proposed approach will save a huge amount of computational and memory demands in dynamic environments. In the second area, motion planning, we are dealing with another problem. The task is to navigate the ligand through the protein or into the protein, which turns out to be a challenging problem because it needs to be solved in 3D with the collision detection

DSpace at University of West Bohemia

Graph set data mining

Author: Schäfer Till
Publication venue
Publication date: 01/01/2023
Field of study

Graphs are among the most versatile abstract data types in computer science. With the variety comes great adoption in various application fields, such as chemistry, biology, social analysis, logistics, and computer science itself. With the growing capacities of digital storage, the collection of large amounts of data has become the norm in many application fields. Data mining, i.e., the automated extraction of non-trivial patterns from data, is a key step to extract knowledge from these datasets and generate value. This thesis is dedicated to concurrent scalable data mining algorithms beyond traditional notions of efficiency for large-scale datasets of small labeled graphs; more precisely, structural clustering and representative subgraph pattern mining. It is motivated by, but not limited to, the need to analyze molecular libraries of ever-increasing size in the drug discovery process. Structural clustering makes use of graph theoretical concepts, such as (common) subgraph isomorphisms and frequent subgraphs, to model cluster commonalities directly in the application domain. It is considered computationally demanding for non-restricted graph classes and with very few exceptions prior algorithms are only suitable for very small datasets. This thesis discusses the first truly scalable structural clustering algorithm StruClus with linear worst-case complexity. At the same time, StruClus embraces the inherent values of structural clustering algorithms, i.e., interpretable, consistent, and high-quality results. A novel two-fold sampling strategy with stochastic error bounds for frequent subgraph mining is presented. It enables fast extraction of cluster commonalities in the form of common subgraph representative sets. StruClus is the first structural clustering algorithm with a directed selection of structural cluster-representative patterns regarding homogeneity and separation aspects in the high-dimensional subgraph pattern space. Furthermore, a novel concept of cluster homogeneity balancing using dynamically-sized representatives is discussed. The second part of this thesis discusses the representative subgraph pattern mining problem in more general terms. A novel objective function maximizes the number of represented graphs for a cardinality-constrained representative set. It is shown that the problem is a special case of the maximum coverage problem and is NP-hard. Based on the greedy approximation of Nemhauser, Wolsey, and Fisher for submodular set function maximization a novel sampling approach is presented. It mines candidate sets that contain an optimal greedy solution with a probabilistic maximum error. This leads to a constant-time algorithm to generate the candidate sets given a fixed-size sample of the dataset. In combination with a cheap single-pass streaming evaluation of the candidate sets, this enables scalability to datasets with billions of molecules on a single machine. Ultimately, the sampling approach leads to the first distributed subgraph pattern mining algorithm that distributes the pattern space and the dataset graphs at the same time

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung