Search CORE

112,968 research outputs found

A Note on Perturbation Results for Learning Empirical Operators

Author: Belkin Mikhail
De Vito Ernesto
Rosasco Lorenzo
Publication venue
Publication date: 19/08/2008
Field of study

A large number of learning algorithms, for example, spectral clustering, kernel Principal Components Analysis and many manifold methods are based on estimating eigenvalues and eigenfunctions of operators defined by a similarity function or a kernel, given empirical data. Thus for the analysis of algorithms, it is an important problem to be able to assess the quality of such approximations. The contribution of our paper is two-fold: 1. We use a technique based on a concentration inequality for Hilbert spaces to provide new much simplified proofs for a number of results in spectral approximation. 2. Using these methods we provide several new results for estimating spectral properties of the graph Laplacian operator extending and strengthening results from [26]

DSpace@MIT

Archivio istituzionale della ricerca - Università di Genova

An optimal bifactor approximation algorithm for the metric uncapacitated facility location problem

Author: Aardal Karen
Byrka Jaroslaw
Publication venue
Publication date: 04/02/2009
Field of study

We obtain a 1.5-approximation algorithm for the metric uncapacitated facility location problem (UFL), which improves on the previously best known 1.52-approximation algorithm by Mahdian, Ye and Zhang. Note, that the approximability lower bound by Guha and Khuller is 1.463. An algorithm is a {\em (

\lambda_f

\lambda_c

)-approximation algorithm} if the solution it produces has total cost at most

\lambda_f \cdot F^* + \lambda_c \cdot C^*

, where

F^*

and

C^*

are the facility and the connection cost of an optimal solution. Our new algorithm, which is a modification of the

(1+2/e)

-approximation algorithm of Chudak and Shmoys, is a (1.6774,1.3738)-approximation algorithm for the UFL problem and is the first one that touches the approximability limit curve

(\gamma_f, 1+2e^{-\gamma_f})

established by Jain, Mahdian and Saberi. As a consequence, we obtain the first optimal approximation algorithm for instances dominated by connection costs. When combined with a (1.11,1.7764)-approximation algorithm proposed by Jain et al., and later analyzed by Mahdian et al., we obtain the overall approximation guarantee of 1.5 for the metric UFL problem. We also describe how to use our algorithm to improve the approximation ratio for the 3-level version of UFL.Comment: A journal versio

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

TU Delft Repository

CWI's Institutional Repository

New Frameworks for Offline and Streaming Coreset Constructions

Author: Braverman Vladimir
Feldman Dan
Lang Harry
Statman Adiel
Zhou Samson
Publication venue
Publication date: 18/09/2022
Field of study

A coreset for a set of points is a small subset of weighted points that approximately preserves important properties of the original set. Specifically, if

P

is a set of points,

Q

is a set of queries, and

f:P\times Q\to\mathbb{R}

is a cost function, then a set

S\subseteq P

with weights

w:P\to[0,\infty)

is an

\epsilon

-coreset for some parameter

\epsilon>0

\sum_{s\in S}w(s)f(s,q)

is a

(1+\epsilon)

multiplicative approximation to

\sum_{p\in P}f(p,q)

for all

q\in Q

. Coresets are used to solve fundamental problems in machine learning under various big data models of computation. Many of the suggested coresets in the recent decade used, or could have used a general framework for constructing coresets whose size depends quadratically on what is known as total sensitivity

t

. In this paper we improve this bound from

O(t^2)

O(t\log t)

. Thus our results imply more space efficient solutions to a number of problems, including projective clustering,

k

-line clustering, and subspace approximation. Moreover, we generalize the notion of sensitivity sampling for sup-sampling that supports non-multiplicative approximations, negative cost functions and more. The main technical result is a generic reduction to the sample complexity of learning a class of functions with bounded VC dimension. We show that obtaining an

(\nu,\alpha)

-sample for this class of functions with appropriate parameters

\nu

and

\alpha

suffices to achieve space efficient

\epsilon

-coresets. Our result implies more efficient coreset constructions for a number of interesting problems in machine learning; we show applications to

k

-median/

k

-means,

k

-line clustering,

j

-subspace approximation, and the integer

(j,k)

-projective clustering problem

arXiv.org e-Print Archive

On Variants of k-means Clustering

Author: Bandyapadhyay Sayan
Varadarajan Kasturi
Publication venue
Publication date: 09/12/2015
Field of study

\textit{Clustering problems} often arise in the fields like data mining, machine learning etc. to group a collection of objects into similar groups with respect to a similarity (or dissimilarity) measure. Among the clustering problems, specifically \textit{

k

-means} clustering has got much attention from the researchers. Despite the fact that

k

-means is a very well studied problem its status in the plane is still an open problem. In particular, it is unknown whether it admits a PTAS in the plane. The best known approximation bound in polynomial time is 9+\eps. In this paper, we consider the following variant of

k

-means. Given a set

C

of points in

\mathcal{R}^d

and a real

f > 0

, find a finite set

F

of points in

\mathcal{R}^d

that minimizes the quantity

f*|F|+\sum_{p\in C} \min_{q \in F} {||p-q||}^2

. For any fixed dimension

d

, we design a local search PTAS for this problem. We also give a "bi-criterion" local search algorithm for

k

-means which uses (1+\eps)k centers and yields a solution whose cost is at most (1+\eps) times the cost of an optimal

k

-means solution. The algorithm runs in polynomial time for any fixed dimension. The contribution of this paper is two fold. On the one hand, we are being able to handle the square of distances in an elegant manner, which yields near optimal approximation bound. This leads us towards a better understanding of the

k

-means problem. On the other hand, our analysis of local search might also be useful for other geometric problems. This is important considering that very little is known about the local search method for geometric approximation.Comment: 15 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Theoretical Interpretations and Applications of Radial Basis Function Networks

Author: Blanzieri Enrico
Publication venue
Publication date: 01/05/2003
Field of study

Medical applications usually used Radial Basis Function Networks just as Artificial Neural Networks. However, RBFNs are Knowledge-Based Networks that can be interpreted in several way: Artificial Neural Networks, Regularization Networks, Support Vector Machines, Wavelet Networks, Fuzzy Controllers, Kernel Estimators, Instanced-Based Learners. A survey of their interpretations and of their corresponding learning algorithms is provided as well as a brief survey on dynamic learning algorithms. RBFNs' interpretations can suggest applications that are particularly interesting in medical domains

Unitn-eprints Research

Fault Tolerant Clustering Revisited

Author: Kumar Nirman
Raichel Benjamin
Publication venue
Publication date: 01/01/2013
Field of study

In discrete k-center and k-median clustering, we are given a set of points P in a metric space M, and the task is to output a set C \subseteq ? P, |C| = k, such that the cost of clustering P using C is as small as possible. For k-center, the cost is the furthest a point has to travel to its nearest center, whereas for k-median, the cost is the sum of all point to nearest center distances. In the fault-tolerant versions of these problems, we are given an additional parameter 1 ?\leq \ell \leq ? k, such that when computing the cost of clustering, points are assigned to their \ell-th nearest-neighbor in C, instead of their nearest neighbor. We provide constant factor approximation algorithms for these problems that are both conceptually simple and highly practical from an implementation stand-point

arXiv.org e-Print Archive

University of Memphis Digital Commons

Streaming Coreset Constructions for M-Estimators

Author: Braverman Vladimir
Feldman Dan
Lang Harry
Rus Daniela
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

We introduce a new method of maintaining a (k,epsilon)-coreset for clustering M-estimators over insertion-only streams. Let (P,w) be a weighted set (where w : P - > [0,infty) is the weight function) of points in a rho-metric space (meaning a set X equipped with a positive-semidefinite symmetric function D such that D(x,z) <=rho(D(x,y) + D(y,z)) for all x,y,z in X). For any set of points C, we define COST(P,w,C) = sum_{p in P} w(p) min_{c in C} D(p,c). A (k,epsilon)-coreset for (P,w) is a weighted set (Q,v) such that for every set C of k points, (1-epsilon)COST(P,w,C) <= COST(Q,v,C) <= (1+epsilon)COST(P,w,C). Essentially, the coreset (Q,v) can be used in place of (P,w) for all operations concerning the COST function. Coresets, as a method of data reduction, are used to solve fundamental problems in machine learning of streaming and distributed data. M-estimators are functions D(x,y) that can be written as psi(d(x,y)) where ({X}, d) is a true metric (i.e. 1-metric) space. Special cases of M-estimators include the well-known k-median (psi(x) =x) and k-means (psi(x) = x^2) functions. Our technique takes an existing offline construction for an M-estimator coreset and converts it into the streaming setting, where n data points arrive sequentially. To our knowledge, this is the first streaming construction for any M-estimator that does not rely on the merge-and-reduce tree. For example, our coreset for streaming metric k-means uses O(epsilon^{-2} k log k log n) points of storage. The previous state-of-the-art required storing at least O(epsilon^{-2} k log k log^{4} n) points

Dagstuhl Research Online Publication Server