12 research outputs found
Integer Point Sets Minimizing Average Pairwise L1-Distance: What is the Optimal Shape of a Town?
An n-town, for a natural number n, is a group of n buildings, each occupying
a distinct position on a 2-dimensional integer grid. If we measure the distance
between two buildings along the axis-parallel street grid, then an n-town has
optimal shape if the sum of all pairwise Manhattan distances is minimized. This
problem has been studied for cities, i.e., the limiting case of very large n.
For cities, it is known that the optimal shape can be described by a
differential equation, for which no closed-form is known. We show that optimal
n-towns can be computed in O(n^7.5) time. This is also practically useful, as
it allows us to compute optimal solutions up to n=80.Comment: 26 pages, 6 figures, to appear in Computational Geometry: Theory and
Application
MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension
Given a dataset of points in a metric space and an integer , a diversity
maximization problem requires determining a subset of points maximizing
some diversity objective measure, e.g., the minimum or the average distance
between two points in the subset. Diversity maximization is computationally
hard, hence only approximate solutions can be hoped for. Although its
applications are mainly in massive data analysis, most of the past research on
diversity maximization focused on the sequential setting. In this work we
present space and pass/round-efficient diversity maximization algorithms for
the Streaming and MapReduce models and analyze their approximation guarantees
for the relevant class of metric spaces of bounded doubling dimension. Like
other approaches in the literature, our algorithms rely on the determination of
high-quality core-sets, i.e., (much) smaller subsets of the input which contain
good approximations to the optimal solution for the whole input. For a variety
of diversity objective functions, our algorithms attain an
-approximation ratio, for any constant , where
is the best approximation ratio achieved by a polynomial-time,
linear-space sequential algorithm for the same diversity objective. This
improves substantially over the approximation ratios attainable in Streaming
and MapReduce by state-of-the-art algorithms for general metric spaces. We
provide extensive experimental evidence of the effectiveness of our algorithms
on both real world and synthetic datasets, scaling up to over a billion points.Comment: Extended version of
http://www.vldb.org/pvldb/vol10/p469-ceccarello.pdf, PVLDB Volume 10, No. 5,
January 201
Recommended from our members
Algorithmic support for commodity-based parallel computing systems.
The Computational Plant or Cplant is a commodity-based distributed-memory supercomputer under development at Sandia National Laboratories. Distributed-memory supercomputers run many parallel programs simultaneously. Users submit their programs to a job queue. When a job is scheduled to run, it is assigned to a set of available processors. Job runtime depends not only on the number of processors but also on the particular set of processors assigned to it. Jobs should be allocated to localized clusters of processors to minimize communication costs and to avoid bandwidth contention caused by overlapping jobs. This report introduces new allocation strategies and performance metrics based on space-filling curves and one dimensional allocation strategies. These algorithms are general and simple. Preliminary simulations and Cplant experiments indicate that both space-filling curves and one-dimensional packing improve processor locality compared to the sorted free list strategy previously used on Cplant. These new allocation strategies are implemented in Release 2.0 of the Cplant System Software that was phased into the Cplant systems at Sandia by May 2002. Experimental results then demonstrated that the average number of communication hops between the processors allocated to a job strongly correlates with the job's completion time. This report also gives processor-allocation algorithms for minimizing the average number of communication hops between the assigned processors for grid architectures. The associated clustering problem is as follows: Given n points in {Re}d, find k points that minimize their average pairwise L{sub 1} distance. Exact and approximate algorithms are given for these optimization problems. One of these algorithms has been implemented on Cplant and will be included in Cplant System Software, Version 2.1, to be released. In more preliminary work, we suggest improvements to the scheduler separate from the allocator
Approximation algorithms for geometric dispersion
The most basic form of the max-sum dispersion problem (MSD) is as follows: given n points in R^q and an integer k, select a set of k points such that the sum of the pairwise distances within the set is maximal. This is a prominent diversity problem, with wide applications in web search and information retrieval, where one needs to find a small and diverse representative subset of a large dataset. The problem has recently received a great deal of attention in the computational geometry and operations research communities; and since it is NP-hard, research has focused on efficient heuristics and approximation algorithms. Several classes of distance functions have been considered in the literature. Many of the most common distances used in applications are induced by a norm in a real vector space. The focus of this thesis is on MSD over these geometric instances. We provide for it simple and fast polynomial-time approximation schemes (PTASs), as well as improved constant-factor approximation algorithms. We pay special attention to the class of negative-type distances, a class that includes Euclidean and Manhattan distances, among many others. In order to exploit the properties of this class, we apply several techniques and results from the theory of isometric embeddings. We explore the following variations of the MSD problem: matroid and matroid-intersection constraints, knapsack constraints, and the mixed-objective problem that maximizes a combination of the sum of pairwise distances with a submodular monotone function. In addition to approximation algorithms, we present a core-set for geometric instances of low dimension, and we discuss the efficient implementation of some of our algorithms for massive datasets, using the streaming and distributed models of computation
Maximum Dispersion and Geometric Maximum Weight Cliques
We consider geometric instances of the problem of finding a set of k vertices in a complete graph with nonnegative edge weights. In particular, we present algorithmic results for the case where vertices are represented by points in d-dimensional space, and edge weights correspond to rectilinear distances. This problem can be considered as a facility location problem, where the objective is to "disperse" a number of facilities, i.e., select a given number of locations from a discrete set of candidates, such that the average distance between selected locations is maximized. Problems of this type have been considered before, with the best result being an approximation algorithm with performance ratio 2. For the case where k is fixed, we establish a linear-time algorithm that finds an optimal solution. For the case where k is part of the input, we present a polynomial-time approximation scheme
Algorithmen fĂĽr Packprobleme
Packing problems belong to the most frequently studied problems in combinatorial optimization. Mainly, the task is to pack a set of small objects into a large container. These kinds of problems, though easy to state, are usually hard to solve. An additional challenge arises, if the set of objects is not completely known beforehand, meaning that an object has to be packed before the next one becomes available. These problems are called online problems. If the set of objects is completely known, they are called offline problems.
In this work, we study two online and one offline packing problem. We present algorithms that either compute an optimal or a provably good solution:
Maintaining Arrays of Contiguous Objects. The problem of maintaining a set of contiguous objects (blocks) inside an array is closely related to storage allocation. Blocks are inserted into the array, stay there for some (unknown) duration, and are then removed from the array. After inserting a block, the next block becomes available. Blocks can be moved inside the array to create free space for further insertions. Our goals are to minimize the time until the last block is removed from the array (the makespan) and the costs for the block moves. We present inapproximability results, an algorithm that achieves an optimal makespan, an algorithm that uses only O(1) block moves per insertion and deletion, and provide computational experiments.
Online Square Packing. In the classical online strip packing problem, one has to find a non-overlapping placement for a set of objects (squares in our setting) inside a semi-infinite
strip, minimizing the height of the occupied area. We study this problem under two additional constraints: Each square has to be packed on top of another square or on the bottom of the strip. Moreover, there has to be a collision-free path from the top of the strip to the square's final position. We present two algorithms that achieve asymptotic competitive factors of 3.5 and 2.6154, respectively.
Point Sets with Minimum Average Distance. A grid point is a point in the plane with integer coordinates. We present an algorithm that selects a set of grid points (town) such that
the average L1 distance between all pairs of points is minimized. Moreover, we consider the problem of choosing point sets (cities) inside a given square such that-again-the interior
distances are minimized. We present a 5.3827-approximation algorithm for this problem.Packprobleme gehören zu den am häufigsten untersuchten Problemen in der kombinatorischen Optimierung. Grundsätzlich besteht die Aufgabe darin, eine Menge von kleinen Objekten in einen größeren Container zu packen. Probleme dieser Art können meistens nur mit hohem Aufwand gelöst werden. Zusätzliche Schwierigkeiten treten auf, wenn die Menge der zu packenden Objekte zu Beginn nicht vollständig bekannt ist, d.h. dass das nächste Objekt erst verfügbar wird, wenn das vorherige gepackt ist. Solche Probleme werden online Probleme genannt. Wenn alle Objekte bekannt sind, spricht man von einem offline Problem.
In dieser Arbeit stellen wir zwei online Packprobleme und ein offline Packproblem vor und entwickeln Algorithmen, die die Probleme entweder optimal oder aber mit einer beweisbaren Güte lösen:
Verwaltung von kontinuierlichen Objekten. Das Problem eine Menge von kontinuierlichen Objekten (Blöcke) in einem Array möglichst gut zu verwalten, ist eng verwandt mit Problemen der Speicherverwaltung. Blöcke werden in einen kontinuierlichen Bereich des Arrays eingefügt und nach einer (unbekannten) Dauer wieder entfernt. Dabei ist immer nur der nächste einzufügende Block bekannt. Um Freiraum für weitere Blöcke zu schaffen, dürfen Blöcke innerhalb des Arrays verschoben werden. Ziel ist es, die Zeit bis der letzte Block entfernt wird (Makespan) und die Kosten für die Verschiebe-Operationen zu minimieren. Wir geben eine komplexitätstheoretische Einordnung dieses Problems, stellen einen Algorithmus vor, der einen optimalen Makespan bestimmt, einen der O(1) Verschiebe-Operationen benötigt und evaluieren verschiedene Algorithmen experimentell.
Online-Strip-Packing. Im klassischen Online-Strip-Packing-Problem wird eine Menge von Objekten (hier: Quadrate) in einen Streifen (unendlicher Höhe) platziert, so dass die Höhe der benutzten Fläche möglichst gering ist. Wir betrachten einen Spezialfall, bei dem zwei zusätzliche Bedingungen gelten: Quadrate müssen auf anderen Quadraten oder auf dem Boden des Streifens platziert werden und die endgültige Position muss auf einem kollisionfreien Weg erreichbar sein. Es werden zwei Algorithmen mit Güten von 3,5 bzw. 2,6154 vorgestellt.
Punktmengen mit minimalem Durchschnittsabstand. Ein Gitterpunkt ist ein Punkt in der Ebene mit ganzzahligen Koordinaten. Wir stellen einen Algorithmus vor, der eine Anzahl von Punkten aus der Menge aller Gitterpunkte auswählt, so dass deren durchschnittlicher L1-Abstand minimal ist. Außerdem betrachten wir das Problem, mehrere Punktmengen mit minimalem Durchschnittsabstand innerhalb eines gegebenen Quadrates auszuwählen. Wir stellen einen 5,3827-Approximationsalgorithmus für dieses Problem vor