112,968 research outputs found
A Note on Perturbation Results for Learning Empirical Operators
A large number of learning algorithms, for example, spectral clustering, kernel Principal Components Analysis and many manifold methods are based on estimating eigenvalues and eigenfunctions of operators defined by a similarity function or a kernel, given empirical data. Thus for the analysis of algorithms, it is an important problem to be able to assess the quality of such approximations. The contribution of our paper is two-fold: 1. We use a technique based on a concentration inequality for Hilbert spaces to provide new much simplified proofs for a number of results in spectral approximation. 2. Using these methods we provide several new results for estimating spectral properties of the graph Laplacian operator extending and strengthening results from [26]
An optimal bifactor approximation algorithm for the metric uncapacitated facility location problem
We obtain a 1.5-approximation algorithm for the metric uncapacitated facility
location problem (UFL), which improves on the previously best known
1.52-approximation algorithm by Mahdian, Ye and Zhang. Note, that the
approximability lower bound by Guha and Khuller is 1.463.
An algorithm is a {\em (,)-approximation algorithm} if
the solution it produces has total cost at most , where and are the facility and the connection
cost of an optimal solution. Our new algorithm, which is a modification of the
-approximation algorithm of Chudak and Shmoys, is a
(1.6774,1.3738)-approximation algorithm for the UFL problem and is the first
one that touches the approximability limit curve
established by Jain, Mahdian and Saberi. As a consequence, we obtain the first
optimal approximation algorithm for instances dominated by connection costs.
When combined with a (1.11,1.7764)-approximation algorithm proposed by Jain et
al., and later analyzed by Mahdian et al., we obtain the overall approximation
guarantee of 1.5 for the metric UFL problem. We also describe how to use our
algorithm to improve the approximation ratio for the 3-level version of UFL.Comment: A journal versio
New Frameworks for Offline and Streaming Coreset Constructions
A coreset for a set of points is a small subset of weighted points that
approximately preserves important properties of the original set. Specifically,
if is a set of points, is a set of queries, and is a cost function, then a set with weights
is an -coreset for some parameter if
is a multiplicative approximation to
for all . Coresets are used to solve fundamental
problems in machine learning under various big data models of computation. Many
of the suggested coresets in the recent decade used, or could have used a
general framework for constructing coresets whose size depends quadratically on
what is known as total sensitivity .
In this paper we improve this bound from to . Thus our
results imply more space efficient solutions to a number of problems, including
projective clustering, -line clustering, and subspace approximation.
Moreover, we generalize the notion of sensitivity sampling for sup-sampling
that supports non-multiplicative approximations, negative cost functions and
more. The main technical result is a generic reduction to the sample complexity
of learning a class of functions with bounded VC dimension. We show that
obtaining an -sample for this class of functions with appropriate
parameters and suffices to achieve space efficient
-coresets.
Our result implies more efficient coreset constructions for a number of
interesting problems in machine learning; we show applications to
-median/-means, -line clustering, -subspace approximation, and the
integer -projective clustering problem
On Variants of k-means Clustering
\textit{Clustering problems} often arise in the fields like data mining,
machine learning etc. to group a collection of objects into similar groups with
respect to a similarity (or dissimilarity) measure. Among the clustering
problems, specifically \textit{-means} clustering has got much attention
from the researchers. Despite the fact that -means is a very well studied
problem its status in the plane is still an open problem. In particular, it is
unknown whether it admits a PTAS in the plane. The best known approximation
bound in polynomial time is 9+\eps.
In this paper, we consider the following variant of -means. Given a set
of points in and a real , find a finite set of
points in that minimizes the quantity . For any fixed dimension , we design a local
search PTAS for this problem. We also give a "bi-criterion" local search
algorithm for -means which uses (1+\eps)k centers and yields a solution
whose cost is at most (1+\eps) times the cost of an optimal -means
solution. The algorithm runs in polynomial time for any fixed dimension.
The contribution of this paper is two fold. On the one hand, we are being
able to handle the square of distances in an elegant manner, which yields near
optimal approximation bound. This leads us towards a better understanding of
the -means problem. On the other hand, our analysis of local search might
also be useful for other geometric problems. This is important considering that
very little is known about the local search method for geometric approximation.Comment: 15 page
Theoretical Interpretations and Applications of Radial Basis Function Networks
Medical applications usually used Radial Basis Function Networks just as Artificial Neural Networks. However, RBFNs are Knowledge-Based Networks that can be interpreted in several way: Artificial Neural Networks, Regularization Networks, Support Vector Machines, Wavelet Networks, Fuzzy Controllers, Kernel Estimators, Instanced-Based Learners. A survey of their interpretations and of their corresponding learning algorithms is provided as well as a brief survey on dynamic learning algorithms. RBFNs' interpretations can suggest applications that are particularly interesting in medical domains
Fault Tolerant Clustering Revisited
In discrete k-center and k-median clustering, we are given a set of points P
in a metric space M, and the task is to output a set C \subseteq ? P, |C| = k,
such that the cost of clustering P using C is as small as possible. For
k-center, the cost is the furthest a point has to travel to its nearest center,
whereas for k-median, the cost is the sum of all point to nearest center
distances. In the fault-tolerant versions of these problems, we are given an
additional parameter 1 ?\leq \ell \leq ? k, such that when computing the cost
of clustering, points are assigned to their \ell-th nearest-neighbor in C,
instead of their nearest neighbor. We provide constant factor approximation
algorithms for these problems that are both conceptually simple and highly
practical from an implementation stand-point
Streaming Coreset Constructions for M-Estimators
We introduce a new method of maintaining a (k,epsilon)-coreset for clustering M-estimators over insertion-only streams. Let (P,w) be a weighted set (where w : P - > [0,infty) is the weight function) of points in a rho-metric space (meaning a set X equipped with a positive-semidefinite symmetric function D such that D(x,z) <=rho(D(x,y) + D(y,z)) for all x,y,z in X). For any set of points C, we define COST(P,w,C) = sum_{p in P} w(p) min_{c in C} D(p,c). A (k,epsilon)-coreset for (P,w) is a weighted set (Q,v) such that for every set C of k points, (1-epsilon)COST(P,w,C) <= COST(Q,v,C) <= (1+epsilon)COST(P,w,C). Essentially, the coreset (Q,v) can be used in place of (P,w) for all operations concerning the COST function. Coresets, as a method of data reduction, are used to solve fundamental problems in machine learning of streaming and distributed data.
M-estimators are functions D(x,y) that can be written as psi(d(x,y)) where ({X}, d) is a true metric (i.e. 1-metric) space. Special cases of M-estimators include the well-known k-median (psi(x) =x) and k-means (psi(x) = x^2) functions. Our technique takes an existing offline construction for an M-estimator coreset and converts it into the streaming setting, where n data points arrive sequentially. To our knowledge, this is the first streaming construction for any M-estimator that does not rely on the merge-and-reduce tree. For example, our coreset for streaming metric k-means uses O(epsilon^{-2} k log k log n) points of storage. The previous state-of-the-art required storing at least O(epsilon^{-2} k log k log^{4} n) points
- …