Search CORE

326 research outputs found

Guruswami-Sinop Rounding without Higher Level Lasserre

Author: Deshpande Amit
Venkat Rakesh
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014)
Publication date: 01/01/2014
Field of study

Guruswami and Sinop give a O(1/delta) approximation guarantee for the non-uniform Sparsest Cut problem by solving O(r)-level Lasserre semidefinite constraints, provided that the generalized eigenvalues of the Laplacians of the cost and demand graphs satisfy a certain spectral condition, namely, the (r+1)-th generalized eigenvalue is at least OPT/(1-delta). Their key idea is a rounding technique that first maps a vector-valued solution to [0,1] using appropriately scaled projections onto Lasserre vectors. In this paper, we show that similar projections and analysis can be obtained using only l_2^2 triangle inequality constraints. This results in a O(r/delta^2) approximation guarantee for the non-uniform Sparsest Cut problem by adding only l_2^2 triangle inequality constraints to the usual semidefinite program, provided that the same spectral condition, the (r+1)-th generalized eigenvalue is at least OPT/(1-delta), holds

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Improved Outlier Robust Seeding for k-means

Author: Deshpande Amit
Pratap Rameshwar
Publication venue
Publication date: 06/09/2023
Field of study

The

k

-means is a popular clustering objective, although it is inherently non-robust and sensitive to outliers. Its popular seeding or initialization called

k

-means++ uses

D^{2}

sampling and comes with a provable

O(\log k)

approximation guarantee \cite{AV2007}. However, in the presence of adversarial noise or outliers,

D^{2}

sampling is more likely to pick centers from distant outliers instead of inlier clusters, and therefore its approximation guarantees \textit{w.r.t.}

k

-means solution on inliers, does not hold. Assuming that the outliers constitute a constant fraction of the given data, we propose a simple variant in the

D^2

sampling distribution, which makes it robust to the outliers. Our algorithm runs in

O(ndk)

time, outputs

O(k)

clusters, discards marginally more points than the optimal number of outliers, and comes with a provable

O(1)

approximation guarantee. Our algorithm can also be modified to output exactly

k

clusters instead of

O(k)

clusters, while keeping its running time linear in

n

and

d

. This is an improvement over previous results for robust

k

-means based on LP relaxation and rounding \cite{Charikar}, \cite{KrishnaswamyLS18} and \textit{robust

k

-means++} \cite{DeshpandeKP20}. Our empirical results show the advantage of our algorithm over

k

-means++~\cite{AV2007}, uniform random seeding, greedy sampling for

k

means~\cite{tkmeanspp}, and robust

k

-means++~\cite{DeshpandeKP20}, on standard real-world and synthetic data sets used in previous work. Our proposal is easily amenable to scalable, faster, parallel implementations of

k

-means++ \cite{Bahmani,BachemL017} and is of independent interest for coreset constructions in the presence of outliers \cite{feldman2007ptas,langberg2010universal,feldman2011unified}

arXiv.org e-Print Archive

Mining Query Plans for Finding Candidate Queries and Sub-Queries for Materialized Views in BI Systems Without Cube Generation

Author: Deshpande Parag
Deshpande Srijay
Kshirsagar Amit
Thakare Atul
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 31/05/2019
Field of study

Materialized views are important for optimizing Business Intelligence (BI) systems when they are designed without data cubes. Selecting candidate queries from large number of queries for materialized views is a challenging task. Most of the work done in the past involves finding out frequent queries from the past workload and creating materialized views from such queries by either manually analyzing workload or using approximate string matching algorithms using query text. Most of the existing methods suggest complete queries but ignore query components such as sub queries for creation of materialized views. This paper presents a novel method to determine on which queries and query components materialized views can be created to optimize aggregate and join queries by mining database of query execution plans which are in the form of binary trees. The proposed algorithm showed significant improvement in terms of more number of optimized queries because it is using the execution plan tree of the query as a basis of selection of query to be optimized using materialized views rather than choosing query text which is used by traditional methods. For selecting a correct set of queries to be optimized using materialized views, the paper proposes efficient specialized frequent tree component mining algorithm with novel heuristics to prune search space. These frequent components are used to determine the possible set of candidate queries for creation of materialized views. Experimentation on standard, real and synthetic data sets, and also the theoretical basis, proved that the proposed method is able to optimize a large number of queries with less number of materialized views and showed a significant improvement in performance compared to traditional methods

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

Author: Amayuelas Alfonso
Deshpande Amit
Goel Naman
Sharma Amit
Publication venue
Publication date: 21/12/2020
Field of study

Training datasets for machine learning often have some form of missingness. For example, to learn a model for deciding whom to give a loan, the available training data includes individuals who were given a loan in the past, but not those who were not. This missingness, if ignored, nullifies any fairness guarantee of the training procedure when the model is deployed. Using causal graphs, we characterize the missingness mechanisms in different real-world scenarios. We show conditions under which various distributions, used in popular fairness algorithms, can or can not be recovered from the training data. Our theoretical results imply that many of these algorithms can not guarantee fairness in practice. Modeling missingness also helps to identify correct design principles for fair algorithms. For example, in multi-stage settings where decisions are made in multiple screening rounds, we use our framework to derive the minimal distributions required to design a fair algorithm. Our proposed algorithm decentralizes the decision-making process and still achieves similar performance to the optimal algorithm that requires centralization and non-recoverable distributions.Comment: To appear in the Proceedings of AAAI 202

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Association for the Advancement of Artificial Intelligence: AAAI Publications

Embedding Approximately Low-Dimensional l_2^2 Metrics into l_1

Author: Deshpande Amit
Harsha Prahladh
Venkat Rakesh
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016)
Publication date: 01/01/2016
Field of study

Goemans showed that any n points x_1,..., x_n in d-dimensions satisfying l_2^2 triangle inequalities can be embedded into l_{1}, with worst-case distortion at most sqrt{d}. We consider an extension of this theorem to the case when the points are approximately low-dimensional as opposed to exactly low-dimensional, and prove the following analogous theorem, albeit with average distortion guarantees: There exists an l_{2}^{2}-to-l_{1} embedding with average distortion at most the stable rank, sr(M), of the matrix M consisting of columns {x_i-x_j}_{i<j}. Average distortion embedding suffices for applications such as the SPARSEST CUT problem. Our embedding gives an approximation algorithm for the SPARSEST CUT problem on low threshold-rank graphs, where earlier work was inspired by Lasserre SDP hierarchy, and improves on a previous result of the first and third author [Deshpande and Venkat, in Proc. 17th APPROX, 2014]. Our ideas give a new perspective on l_{2}^{2} metric, an alternate proof of Goemans\u27 theorem, and a simpler proof for average distortion sqrt{d}

Dagstuhl Research Online Publication Server