7 research outputs found
Simple Distributed Weighted Matchings
Wattenhofer [WW04] derive a complicated distributed algorithm to compute a
weighted matching of an arbitrary weighted graph, that is at most a factor 5
away from the maximum weighted matching of that graph. We show that a variant
of the obvious sequential greedy algorithm [Pre99], that computes a weighted
matching at most a factor 2 away from the maximum, is easily distributed. This
yields the best known distributed approximation algorithm for this problem so
far
Sublinear Estimation of Weighted Matchings in Dynamic Data Streams
This paper presents an algorithm for estimating the weight of a maximum
weighted matching by augmenting any estimation routine for the size of an
unweighted matching. The algorithm is implementable in any streaming model
including dynamic graph streams. We also give the first constant estimation for
the maximum matching size in a dynamic graph stream for planar graphs (or any
graph with bounded arboricity) using space which also
extends to weighted matching. Using previous results by Kapralov, Khanna, and
Sudan (2014) we obtain a approximation for general graphs
using space in random order streams, respectively. In
addition, we give a space lower bound of for any
randomized algorithm estimating the size of a maximum matching up to a
factor for adversarial streams
Cross-layer Congestion Control, Routing and Scheduling Design in Ad Hoc Wireless Networks
This paper considers jointly optimal design of crosslayer congestion control, routing and scheduling for ad hoc
wireless networks. We first formulate the rate constraint and scheduling constraint using multicommodity flow variables, and formulate resource allocation in networks with fixed wireless channels (or single-rate wireless devices that can mask channel variations) as a utility maximization problem with these constraints.
By dual decomposition, the resource allocation problem
naturally decomposes into three subproblems: congestion control,
routing and scheduling that interact through congestion price.
The global convergence property of this algorithm is proved. We
next extend the dual algorithm to handle networks with timevarying
channels and adaptive multi-rate devices. The stability
of the resulting system is established, and its performance is
characterized with respect to an ideal reference system which
has the best feasible rate region at link layer.
We then generalize the aforementioned results to a general
model of queueing network served by a set of interdependent
parallel servers with time-varying service capabilities, which
models many design problems in communication networks. We
show that for a general convex optimization problem where a
subset of variables lie in a polytope and the rest in a convex set,
the dual-based algorithm remains stable and optimal when the
constraint set is modulated by an irreducible finite-state Markov
chain. This paper thus presents a step toward a systematic way
to carry out cross-layer design in the framework of âlayering as
optimization decompositionâ for time-varying channel models
Parallel Approximation Algorithms for Maximum Weighted Matching in General Graphs
. The problem of computing a matching of maximum weight in a given edge-weighted graph is not known to be P-hard or in RNC. This paper presents four parallel approximation algorithms for this problem. The first is an RNC-approximation scheme, i.e., an RNC algorithm that computes a matching of weight at least 10 ffl times the maximum for any given constant ffl ? 0. The second one is an NC approximation algorithm achieving an approximation ratio of 1 2+ffl for any fixed ffl ? 0. The third and fourth algorithms only need to know the total order of weights, so they are useful when the edge weights require a large amount of memories to represent. The third one is an NC approximation algorithm that finds a matching of weight at least 2 31+2 times the maximum, where 1 is the maximum degree of the graph. The fourth one is an RNC algorithm that finds a matching of weight at least 1 21+4 times the maximum on average, and runs in O(log 1) time, not depending on the size of the graph. Key word..
On algorithms for large-scale graph and clustering problems
Gegenstand dieser Arbeit sind algorithmische Methoden der modernen Datenanalyse. Dabei werden vorwiegend zwei ĂŒbergeordnete Themen behandelt: Datenstromalgorithmen mit Kompressionseigenschaften und Approximationsalgorithmen fĂŒr Clusteringverfahren. Datenstromalgorithmen verarbeiten einen Datensatz sequentiell und haben das Ziel, Eigenschaften des Datensatzes (approximativ) zu bestimmen, ohne dabei den gesamten Datensatz abzuspeichern. Unter Clustering versteht man die Partitionierung eines Datensatzes in verschiedene Gruppen.
Das erste dargestellte Problem betrifft Matching in Graphen. Hier besteht der Datensatz aus einer Folge von EinfĂŒge- und Löschoperationen von Kanten. Die Aufgabe besteht darin, die GröĂe des so genannten Maximum Matchings so genau wie möglich zu bestimmen. Es wird ein Algorithmus vorgestellt, der, unter der Annahme, dass das Matching höchstens die GröĂe k hat, die exakte GröĂe bestimmt und dabei kÂČ Speichereinheiten benötigt. Dieser Algorithmus lĂ€sst sich weiterhin verwenden um eine konstante Approximation der MatchinggröĂe in planaren Graphen zu bestimmen. Des Weiteren werden untere Schranken fĂŒr den benötigten Speicherplatz bestimmt und eine Reduktion von gewichtetem Matching zu ungewichteten Matching durchgefĂŒhrt.
AnschlieĂend werden Datenstromalgorithmen fĂŒr die Nachbarschaftssuche betrachtet, wobei die Aufgabe darin besteht, fĂŒr n gegebene Mengen die Paare mit hoher Ăhnlichkeit in nahezu Linearzeit zu finden. Dabei ist der Jaccard Index |A â© B|/|A U B| das ĂhnlichkeitsmaĂ fĂŒr zwei Mengen A und B. In der Arbeit wird eine Datenstruktur beschrieben, die dies erstmalig in dynamischen Datenströmen mit geringem Speicherplatzverbrauch leistet. Dabei werden Zufallszahlen mit nur 2-facher UnabhĂ€ngigkeit verwendet, was eine sehr effiziente Implementierung ermöglicht.
Das dritte Problem befindet sich an der Schnittstelle zwischen den beiden Themen dieser Arbeit und betrifft das k-center Clustering Problem in Datenströmen mit einem Zeitfenster. Die Aufgabe besteht darin k Zentren zu finden, sodass die maximale Distanz unter allen Punkten zu dem jeweils nĂ€chsten Zentrum minimiert wird. Ergebnis sind ein 6-Approximationalgorithmus fĂŒr ein beliebiges k und ein optimaler 4-Approximationsalgorithmus fĂŒr k = 2. Die entwickelten Techniken lassen sich ebenfalls auf das Durchmesserproblem anwenden und ermöglichen fĂŒr dieses Problem einen optimalen Algorithmus.
Danach werden Clusteringprobleme bezĂŒglich der Jaccard Distanz analysiert. Dabei sind wieder eine Menge N von Teilmengen aus einer Grundgesamtheit U sind und die Aufgabe besteht darin eine Teilmenge zu finden, die max 1-|X â© C|/|X U C| minimiert. Es wird gezeigt, dass zwar eine exakte Lösung des Problems NP-schwer ist, es aber gleichzeitig eine PTAS gibt.
AbschlieĂend wird die weit verbreitete lokale Suchheuristik fĂŒr k-median und k-means Clustering untersucht. Obwohl es im Allgemeinen schwer ist, diese Probleme exakt oder auch nur approximativ zu lösen, gelten sie in der Praxis als relativ gut handhabbar, was andeutet, dass die HĂ€rteresultate auf pathologischen Eingaben beruhen. Auf Grund dieser Diskrepanz gab es in der Vergangenheit praxisrelevante DatensĂ€tze zu charakterisieren. FĂŒr drei der wichtigsten Charakterisierungen wird das Verhalten einer lokalen Suchheuristik untersucht mit dem Ergebnis, dass die lokale Suchheuristik in diesen FĂ€llen optimale oder fast optimale Cluster ermittelt
Parallel and External High Quality Graph Partitioning
Partitioning graphs into k blocks of roughly equal size such that few edges run between the blocks is a key tool for processing and analyzing large complex real-world networks. The graph partitioning problem has multiple practical applications in parallel and distributed computations, data storage, image processing, VLSI physical design and many more. Furthermore, recently, size, variety, and structural complexity of real-world networks has grown dramatically. Therefore, there is a demand for efficient graph partitioning algorithms that fully utilize computational power and memory capacity of modern machines.
A popular and successful heuristic to compute a high-quality partitions of large networks in reasonable time is approach which contracts the graph preserving its structure and then partitions it using a complex graph partitioning algorithm. Specifically, the multi-level graph partitioning approach consists of three main phases: coarsening, initial partitioning, and uncoarsening. During the coarsening phase, the graph is recursively contracted preserving its structure and properties until it is small enough to compute its initial partition during the initial partitioning phase. Afterwards, during the uncoarsening phase the partition of the contracted graph is projected onto the original graph and refined using, for example, local search.
Most of the research on heuristical graph partitioning focuses on sequential algorithms or parallel algorithms in the distributed memory model. Unfortunately, previous approaches to graph partitioning are not able to process large networks and rarely take in into account several aspects of modern computational machines. Specifically, the amount of cores per chip grows each year as well as the price of RAM reduces slower than the real-world graphs grow. Since HDDs and SSDs are 50 â 400 times cheaper than RAM, external memory makes it possible to process large real-world graphs for a reasonable price. Therefore, in order to better utilize contemporary computational machines, we develop efficient algorithms for the shared-memory and the external memory models.
First, we present an approach to shared-memory parallel multi-level graph partitioning that guarantees balanced solutions, shows high speed-ups for a variety of large graphs and yields very good quality independently of the number of cores used. Important ingredients include parallel label propagation for both coarsening and uncoarsening, parallel initial partitioning, a simple yet effective approach to parallel localized local search, and fast locality preserving hash tables that effectively utilizes caches. The main idea of the parallel localized local search is that each processors refines only a small area around a random vertex reducing interactions between processors. For example, on 79 cores, our algorithms partitions a graph with more than 3 billions of edges into 16 blocks cutting 4.5% less edges than the closest competitor and being more than two times faster. Furthermore, another competitors is not able to partition this graph.
We then present an approach to external memory graph partitioning that is able to partition large graphs that do not fit into RAM. Specifically, we consider the semi-external and the external memory model. In both models a data structure of size proportional to the number of edges does not fit into the RAM. The difference is that the former model assumes that a data structure of size proportional to the number of vertices fits into the RAM whereas the latter assumes the opposite. We address the graph partitioning problem in both models by adapting the size-constrained label propagation technique for the semi-external model and by developing a size-constrained clustering algorithm based on graph coloring in the external memory. Our semi-external size-constrained label propagation algorithm (or external memory clustering algorithm) can be used to compute graph clusterings and is a prerequisite for the (semi-)external graph partitioning algorithm. The algorithms are then used for both the coarsening and the uncoarsening phase of a multi-level algorithm to compute graph partitions. Our (semi-)external algorithm is able to partition and cluster huge complex networks with billions of edges on cheap commodity machines. Experiments demonstrate that the semi-external graph partitioning algorithm is scalable and can compute high quality partitions in time that is comparable to the running time of an efficient internal memory implementation. A parallelization of the algorithm in the semi-external model further reduces running times.
Additionally, we develop a speed-up technique for the hypergraph partitioning algorithms. Hypergraphs are an extension of graphs that allow a single edge to connect more than two vertices. Therefore, they describe models and processes more accurately additionally allowing more possibilities for improvement. Most multi-level hypergraph partitioning algorithms perform some computations on vertices and their set of neighbors. Since these computations can be super-linear, they have a significant impact on the overall running time on large hypergraphs. Therefore, to further reduce the size of hyperedges, we develop a pin-sparsifier based on the min-hash technique that clusters vertices with similar neighborhood. Further, vertices that belong to the same cluster are substituted by one vertex, which is connected to their neighbors, therefore, reducing the size of the hypergraph. Our algorithm sparsifies a hypergraph such that the resulting graph can be partitioned significantly faster without loss in quality (or with insignificant loss). On average, KaHyPar with sparsifier performs partitioning about 1.5 times faster while preserving solution quality if hyperedges are large.
All aforementioned frameworks are publicly available