Search CORE

802 research outputs found

On Correcting Inputs: Inverse Optimization for Online Structured Prediction

Author: Daumé III Hal
Khuller Samir
Purohit Manish
Sanders Gregory
Publication venue
Publication date: 01/01/2015
Field of study

Algorithm designers typically assume that the input data is correct, and then proceed to find "optimal" or "sub-optimal" solutions using this input data. However this assumption of correct data does not always hold in practice, especially in the context of online learning systems where the objective is to learn appropriate feature weights given some training samples. Such scenarios necessitate the study of inverse optimization problems where one is given an input instance as well as a desired output and the task is to adjust the input data so that the given output is indeed optimal. Motivated by learning structured prediction models, in this paper we consider inverse optimization with a margin, i.e., we require the given output to be better than all other feasible outputs by a desired margin. We consider such inverse optimization problems for maximum weight matroid basis, matroid intersection, perfect matchings, minimum cost maximum flows, and shortest paths and derive the first known results for such problems with a non-zero margin. The effectiveness of these algorithmic approaches to online learning for structured prediction is also discussed.Comment: Conference version to appear in FSTTCS, 201

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Low-Degree Spanning Trees of Small Weight

Author: Balaji Raghavachari
Garey Michael R.
Neal Young
Samir Khuller
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/1996
Field of study

The degree-d spanning tree problem asks for a minimum-weight spanning tree in which the degree of each vertex is at most d. When d=2 the problem is TSP, and in this case, the well-known Christofides algorithm provides a 1.5-approximation algorithm (assuming the edge weights satisfy the triangle inequality). In 1984, Christos Papadimitriou and Umesh Vazirani posed the challenge of finding an algorithm with performance guarantee less than 2 for Euclidean graphs (points in R^n) and d > 2. This paper gives the first answer to that challenge, presenting an algorithm to compute a degree-3 spanning tree of cost at most 5/3 times the MST. For points in the plane, the ratio improves to 3/2 and the algorithm can also find a degree-4 spanning tree of cost at most 5/4 times the MST.Comment: conference version in Symposium on Theory of Computing (1994

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California

Dartmouth Digital Commons (Dartmouth College)

Approximability of Connected Factors

Author: A.E. Baburin
B. Escoffier
C.H. Papadimitriou
C.H. Papadimitriou
F. Cheah
H. Kaplan
J. Cheriyan
M.L. Fisher
S. Guha
S. Khuller
S. Khuller
W.T. Tutte
Y.H. Chan
Publication venue
Publication date: 09/10/2013
Field of study

Finding a d-regular spanning subgraph (or d-factor) of a graph is easy by Tutte's reduction to the matching problem. By the same reduction, it is easy to find a minimal or maximal d-factor of a graph. However, if we require that the d-factor is connected, these problems become NP-hard - finding a minimal connected 2-factor is just the traveling salesman problem (TSP). Given a complete graph with edge weights that satisfy the triangle inequality, we consider the problem of finding a minimal connected

d

-factor. We give a 3-approximation for all

d

and improve this to an (r+1)-approximation for even d, where r is the approximation ratio of the TSP. This yields a 2.5-approximation for even d. The same algorithm yields an (r+1)-approximation for the directed version of the problem, where r is the approximation ratio of the asymmetric TSP. We also show that none of these minimization problems can be approximated better than the corresponding TSP. Finally, for the decision problem of deciding whether a given graph contains a connected d-factor, we extend known hardness results.Comment: To appear in the proceedings of WAOA 201

arXiv.org e-Print Archive

Crossref

University of Twente Research Information

Revisiting Connected Dominating Sets: An Optimal Local Algorithm?

Author: Khuller Samir
Yang Sheng
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)
Publication date: 01/01/2016
Field of study

In this paper we consider the classical Connected Dominating Set (CDS) problem. Twenty years ago, Guha and Khuller developed two algorithms for this problem - a centralized greedy approach with an approximation guarantee of H(D) +2, and a local greedy approach with an approximation guarantee of 2(H(D)+1) (where H() is the harmonic function, and D is the maximum degree in the graph). A local greedy algorithm uses significantly less information about the graph, and can be useful in a variety of contexts. However, a fundamental question remained - can we get a local greedy algorithm with the same performance guarantee as the global greedy algorithm without the penalty of the multiplicative factor of "2" in the approximation factor? In this paper, we answer that question in the affirmative

Dagstuhl Research Online Publication Server

Designing Multi-Commodity Flow Trees

Author: Balaji Raghavachari
Garey
Gomory
Gusfield
Leighton
Neal Young
Samir Khuller
Seymour
Tragoudas
Publication venue: 'Elsevier BV'
Publication date: 01/01/1994
Field of study

The traditional multi-commodity flow problem assumes a given flow network in which multiple commodities are to be maximally routed in response to given demands. This paper considers the multi-commodity flow network-design problem: given a set of multi-commodity flow demands, find a network subject to certain constraints such that the commodities can be maximally routed. This paper focuses on the case when the network is required to be a tree. The main result is an approximation algorithm for the case when the tree is required to be of constant degree. The algorithm reduces the problem to the minimum-weight balanced-separator problem; the performance guarantee of the algorithm is within a factor of 4 of the performance guarantee of the balanced-separator procedure. If Leighton and Rao's balanced-separator procedure is used, the performance guarantee is O(log n). This improves the O(log^2 n) approximation factor that is trivial to obtain by a direct application of the balanced-separator method.Comment: Conference version in WADS'9

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California

On the Cost of Essentially Fair Clusterings

Author: Bercea Ioana O.
Khuller Samir
Kumar Aounon
Schmidt Daniel R.
Schmidt Melanie
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 26/11/2018
Field of study

Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased. At NIPS 2017, Chierichetti et al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair

k

-center problem and a

O(t)

-approximation for the fair

k

-median problem, where

t

is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair

k

-center. We extend and improve the known results. Firstly, we give a 5-approximation for the fair

k

-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives

k

-center,

k

-supplier,

k

-median,

k

-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed

arXiv.org e-Print Archive

Kölner UniversitätsPublikationsServer

Dagstuhl Research Online Publication Server

Scheduling Distributed Clusters of Parallel Machines: Primal-Dual and LP-based Approximation Algorithms

Author: Chao Megan
Khuller Samir
Murray Riley
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th Annual European Symposium on Algorithms (ESA 2016)
Publication date: 01/01/2016
Field of study

The Map-Reduce computing framework rose to prominence with datasets of such size that dozens of machines on a single cluster were needed for individual jobs. As datasets approach the exabyte scale, a single job may need distributed processing not only on multiple machines, but on multiple clusters. We consider a scheduling problem to minimize weighted average completion time of n jobs on m distributed clusters of parallel machines. In keeping with the scale of the problems motivating this work, we assume that (1) each job is divided into m "subjobs" and (2) distinct subjobs of a given job may be processed concurrently. When each cluster is a single machine, this is the NP-Hard concurrent open shop problem. A clear limitation of such a model is that a serial processing assumption sidesteps the issue of how different tasks of a given subjob might be processed in parallel. Our algorithms explicitly model clusters as pools of resources and effectively overcome this issue. Under a variety of parameter settings, we develop two constant factor approximation algorithms for this problem. The first algorithm uses an LP relaxation tailored to this problem from prior work. This LP-based algorithm provides strong performance guarantees. Our second algorithm exploits a surprisingly simple mapping to the special case of one machine per cluster. This mapping-based algorithm is combinatorial and extremely fast. These are the first constant factor approximations for this problem

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Matroid and Knapsack Center Problems

Author: A. Schrijver
D. Hochbaum
D.Z. Chen
F. Grandoni
G.N. Frederickson
J. Chuzhoy
J. Edmonds
J. Li
M. Charikar
M. Hajiaghayi
R. Cole
R. Matthew McCutchen
S. Chechik
S. Khuller
S. Khuller
T. Gonzalez
Publication venue
Publication date: 01/01/2013
Field of study

In the classic

k

-center problem, we are given a metric graph, and the objective is to open

k

nodes as centers such that the maximum distance from any vertex to its closest center is minimized. In this paper, we consider two important generalizations of

k

-center, the matroid center problem and the knapsack center problem. Both problems are motivated by recent content distribution network applications. Our contributions can be summarized as follows: 1. We consider the matroid center problem in which the centers are required to form an independent set of a given matroid. We show this problem is NP-hard even on a line. We present a 3-approximation algorithm for the problem on general metrics. We also consider the outlier version of the problem where a given number of vertices can be excluded as the outliers from the solution. We present a 7-approximation for the outlier version. 2. We consider the (multi-)knapsack center problem in which the centers are required to satisfy one (or more) knapsack constraint(s). It is known that the knapsack center problem with a single knapsack constraint admits a 3-approximation. However, when there are at least two knapsack constraints, we show this problem is not approximable at all. To complement the hardness result, we present a polynomial time algorithm that gives a 3-approximate solution such that one knapsack constraint is satisfied and the others may be violated by at most a factor of

1+\epsilon

. We also obtain a 3-approximation for the outlier version that may violate the knapsack constraint by

1+\epsilon

.Comment: A preliminary version of this paper is accepted to IPCO 201

arXiv.org e-Print Archive

CiteSeerX

Crossref