Search CORE

13 research outputs found

MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension

Author: Ceccarello Matteo
Pietracaprina Andrea
Pucci Geppino
Upfal Eli
Publication venue
Publication date: 01/01/2017
Field of study

Given a dataset of points in a metric space and an integer

k

, a diversity maximization problem requires determining a subset of

k

points maximizing some diversity objective measure, e.g., the minimum or the average distance between two points in the subset. Diversity maximization is computationally hard, hence only approximate solutions can be hoped for. Although its applications are mainly in massive data analysis, most of the past research on diversity maximization focused on the sequential setting. In this work we present space and pass/round-efficient diversity maximization algorithms for the Streaming and MapReduce models and analyze their approximation guarantees for the relevant class of metric spaces of bounded doubling dimension. Like other approaches in the literature, our algorithms rely on the determination of high-quality core-sets, i.e., (much) smaller subsets of the input which contain good approximations to the optimal solution for the whole input. For a variety of diversity objective functions, our algorithms attain an

(\alpha+\epsilon)

-approximation ratio, for any constant

\epsilon>0

, where

\alpha

is the best approximation ratio achieved by a polynomial-time, linear-space sequential algorithm for the same diversity objective. This improves substantially over the approximation ratios attainable in Streaming and MapReduce by state-of-the-art algorithms for general metric spaces. We provide extensive experimental evidence of the effectiveness of our algorithms on both real world and synthetic datasets, scaling up to over a billion points.Comment: Extended version of http://www.vldb.org/pvldb/vol10/p469-ceccarello.pdf, PVLDB Volume 10, No. 5, January 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Padova

The planar multiple obnoxious facilities location problem: A Voronoi based heuristic

Author: Batta
Berman
Chandra
Church
Church
Colmenar
Drezner
Drezner
Drezner
Drezner
Drezner
Drezner
Drezner
Drezner
Erkut
Erkut
Gill
Golin
Hanselman
Hansen
Karkazis
Kuby
Law
Le Digabel
Locatelli
Maranas
Maranas
Na
Nurmela
Ohya
Okabe
Pawel Kalczynski
Said Salhi
Salhi
Shamos
Sugihara
Suzuki
Szabo
Tellier
Voronoï
Welch
Welch
Wolfram
Wächter
Zvi Drezner
Publication venue: 'Elsevier BV'
Publication date: 23/08/2018
Field of study

Consider the situation where a given number of facilities are to be located in a convex polygon with the objective of maximizing the minimum distance between facilities and a given set of communities with the important additional condition that the facilities have to be farther than a certain distance from one another. This continuous multiple obnoxious facility location problem, which has two variants, is very complex to solve using commercial nonlinear optimizers. We propose a mathematical formulation and a heuristic approach based on Voronoi diagrams and an optimally solved binary linear program. As there are no nonlinear optimization solvers that guarantee optimality, we compare our results with a popular multi-start approach using interior point, genetic algorithm (GA), and sparse non-linear optimizer (SNOPT) solvers in Matlab. These are state of the art solvers for dealing with constrained non linear problems. Each instance is solved using 100 randomly generated starting solutions and the overall best is then selected. It was found that the proposed heuristic results are much better and were obtained in a fraction of the computer time required by the other methods.The multiple obnoxious location problem is a perfect example where all-purpose non-linear non-convex solvers perform poorly and hence the best way forward is to design and analyze heuristics that have the power and the exibility to deal with such a high level of complexity

Crossref

Kent Academic Repository

The Remote-Clique Problem Revisited

Author: Birnbaum Benjamin E.
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2006
Field of study

Given a positive integer k and a complete graph with non-negative edge weights that satisfy the triangle inequality, the remote-clique problem is to find a subset of k vertices having a maximum-weight induced subgraph. A greedy algorithm for the problem has been shown to have an approximation ratio of 4, but this analysis was not shown to be tight. In this thesis, we present an algorithm called d-Greedy Augment that generalizes this greedy algorithm (they are equivalent when d = 1). We use the technique of factor-revealing linear programs to prove that d-Greedy Augment, which has a running time of O(kdnd ), achieves an approximation ratio of (2k ? 2)/(k + d ? 2). Thus, when d = 1, d-Greedy Augment achieves an approximation ratio of 2 and runs in time O(kn), making it the fastest known 2-approximation for the remote-clique problem. Beyond proving this worst-case result, we also examine the behavior of d-Greedy Augment in practice. First, we provide some theoretical results regarding the expected case performance of d-Greedy Augment on random graphs, and second, we describe data from some experiments that test the performance of d-Greedy Augment and related heuristics

Washington University St. Louis: Open Scholarship

Provable randomized rounding for minimum-similarity diversification

Author: Gionis Aristides
Mahadevan Ananth
Matakos Antonis
Ordozgoiti Bruno
Publication venue
Publication date: 01/01/2022
Field of study

When searching for information in a data collection, we are often interested not only in finding relevant items, but also in assembling a diverse set, so as to explore different concepts that are present in the data. This problem has been researched extensively. However, finding a set of items with minimal pairwise similarities can be computationally challenging, and most existing works striving for quality guarantees assume that item relatedness is measured by a distance function. Given the widespread use of similarity functions in many domains, we believe this to be an important gap in the literature. In this paper we study the problem of finding a diverse set of items, when item relatedness is measured by a similarity function. We formulate the diversification task using a flexible, broadly applicable minimization objective, consisting of the sum of pairwise similarities of the selected items and a relevance penalty term. To find good solutions we adopt a randomized rounding strategy, which is challenging to analyze because of the cardinality constraint present in our formulation. Even though this obstacle can be overcome using dependent rounding, we show that it is possible to obtain provably good solutions using an independent approach, which is faster, simpler to implement and completely parallelizable. Our analysis relies on a novel bound for the ratio of Poisson-Binomial densities, which is of independent interest and has potential implications for other combinatorial-optimization problems. We leverage this result to design an efficient randomized algorithm that provides a lower-order additive approximation guarantee. We validate our method using several benchmark datasets, and show that it consistently outperforms the greedy approaches that are commonly used in the literature.Peer reviewe

PubMed Central

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

Diverse sampling of streaming data

Author: Turmukhametova Aizana
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 49-51).This thesis addresses the problem of diverse sampling as a dispersion problem and proposes solutions that are optimized for large streaming data. Finding the optimal solution to the dispersion problem is NP-hard. Therefore, existing and proposed solutions are approximation algorithms. This work evaluates the performance of dierent algorithms in practice and compares them to the theoretical guarantees.by Aizana Turmukhametova.M. Eng

DSpace@MIT

Approximation algorithms for geometric dispersion

Author: Cevallos Manzano Alfonso Bolívar
Publication venue: Lausanne, EPFL
Publication date: 09/11/2016
Field of study

The most basic form of the max-sum dispersion problem (MSD) is as follows: given n points in R^q and an integer k, select a set of k points such that the sum of the pairwise distances within the set is maximal. This is a prominent diversity problem, with wide applications in web search and information retrieval, where one needs to find a small and diverse representative subset of a large dataset. The problem has recently received a great deal of attention in the computational geometry and operations research communities; and since it is NP-hard, research has focused on efficient heuristics and approximation algorithms. Several classes of distance functions have been considered in the literature. Many of the most common distances used in applications are induced by a norm in a real vector space. The focus of this thesis is on MSD over these geometric instances. We provide for it simple and fast polynomial-time approximation schemes (PTASs), as well as improved constant-factor approximation algorithms. We pay special attention to the class of negative-type distances, a class that includes Euclidean and Manhattan distances, among many others. In order to exploit the properties of this class, we apply several techniques and results from the theory of isometric embeddings. We explore the following variations of the MSD problem: matroid and matroid-intersection constraints, knapsack constraints, and the mixed-objective problem that maximizes a combination of the sum of pairwise distances with a submodular monotone function. In addition to approximation algorithms, we present a core-set for geometric instances of low dimension, and we discuss the efficient implementation of some of our algorithms for massive datasets, using the streaming and distributed models of computation

Infoscience - École polytechnique fédérale de Lausanne

29th International Symposium on Algorithms and Computation: ISAAC 2018, December 16-19, 2018, Jiaoxi, Yilan, Taiwan

Author: ISAAC <29. 2018, Jiaoxi, Yilan>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing
Publication date: 01/12/2018
Field of study

Digitale Bibliothek Thüringen