16 research outputs found
Algorithmic Techniques for Processing Data Streams
We give a survey at some algorithmic techniques for processing data streams. After covering the basic methods of sampling and sketching, we present more evolved procedures that resort on those basic ones. In particular, we examine algorithmic schemes for similarity mining, the concept of group testing, and techniques for clustering and summarizing data streams
Integer Point Sets Minimizing Average Pairwise L1-Distance: What is the Optimal Shape of a Town?
An n-town, for a natural number n, is a group of n buildings, each occupying
a distinct position on a 2-dimensional integer grid. If we measure the distance
between two buildings along the axis-parallel street grid, then an n-town has
optimal shape if the sum of all pairwise Manhattan distances is minimized. This
problem has been studied for cities, i.e., the limiting case of very large n.
For cities, it is known that the optimal shape can be described by a
differential equation, for which no closed-form is known. We show that optimal
n-towns can be computed in O(n^7.5) time. This is also practically useful, as
it allows us to compute optimal solutions up to n=80.Comment: 26 pages, 6 figures, to appear in Computational Geometry: Theory and
Application
Algorithms for streaming graphs
Für einen Algorithmus zum Lösen eines Graphenproblems wird üblicherweise angenommen, dieser sei mit wahlfreiem Zugriff (random access) auf den Eingabegraphen G ausgestattet, als auch mit einem Arbeitsspeicher, der G vollständig aufzunehmen vermag. Diese Annahmen erweisen sich als fragwürdig, wenn Graphen betrachtet werden, deren Größe jene konventioneller Arbeitsspeicher übersteigt. Solche Graphen können nur auf externen Speichern wie Festplatten oder Magnetbändern vorrätig gehalten werden, auf denen wahlfreier Zugriff sehr zeitaufwändig ist. Um riesige Graphen zu bearbeiten, die auf externen Speichern liegen, hat Muthukrishnan 2003 das Modell eines Semi-Streaming Algorithmus vorgeschlagen. Dieses Modell beschränkt die Größe des Arbeitsspeichers und verbietet den wahlfreien Zugriff auf den Eingabegraphen G. Im Gegenteil wird angenommen, die Eingabe sei ein Datenstrom bestehend aus Kanten von G in beliebiger Reihenfolge. In der vorliegenden Dissertation entwickeln wir Algorithmen im Semi-Streaming Modell für verschiedene Graphenprobleme. Für das Testen des Zusammenhangs und der Bipartität eines Graphen, als auch für die Berechnung eines minimal spannenden Baumes stellen wir Algorithmen vor, die asymptotisch optimale Laufzeiten erreichen. Es ist bekannt, dass kein Semi-Streaming Algorithmus existieren kann, der ein größtes gewichtetes Matching in einem Graphen findet. Für dieses Problem geben wir den besten bekannten Approximationsalgorithmus an. Schließlich zeigen wir, dass sowohl ein minimaler als auch ein maximaler Schnitt in einem Graphen nicht von einem Semi-Streaming Algorithmus berechnet werden kann. Für beide Probleme stellen wir randomisierte Approximationsalgorithmen im Semi-Streaming Modell vor.An algorithm solving a graph problem is usually expected to have fast random access to the input graph G and a working memory that is able to store G completely. These powerful assumptions are put in question by massive graphs that exceed common working memories and that can only be stored on disks or even tapes. Here, random access is very time-consuming. To tackle massive graphs stored on external memories, Muthukrishnan proposed the semi-streaming model in 2003. It permits a working memory of restricted size and forbids random access to the input graph. In contrast, the input is assumed to be a stream of edges in arbitrary order. In this thesis we develop algorithms in the semi-streaming model approaching different graph problems. For the problems of testing graph connectivity and bipartiteness and for the computation of a minimum spanning tree, we show how to obtain running times that are asymptotically optimal. For the problem of finding a maximum weighted matching, which is known to be intractable in the semi-streaming model, we present the best known approximation algorithm. Finally, we show the minimum and the maximum cut problem in a graph both to be intractable in the semi-streaming model and give semi-streaming algorithms that approximate respective solutions in a randomized fashion
k-connectivity in the semi-streaming model
We present the first semi-streaming algorithms to determine k-connectivity of an undirected graph with k being any constant. The semi-streaming model for graph algorithms was introduced by Muthukrishnan in 2003 and turns out to be useful when dealing with massive graphs streamed in from an external storage device. Our two semi-streaming algorithms each compute a sparse subgraph of an input graph G and can use this subgraph in a postprocessing step to decide k-connectivity of G. To this end the first algorithm reads the input stream only once and uses time O(k 2 n) to process each input edge. The second algorithm reads the input k + 1 times and needs time O(k + α(n)) per input edge. Using its constructed subgraph the second algorithm can also generate all l-separators of the input graph for all l < k