3 research outputs found
On the Sample Complexity of Subspace Learning
A large number of algorithms in machine learning, from principal component
analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral
embedding and support estimation methods, rely on estimating a linear subspace
from samples. In this paper we introduce a general formulation of this problem
and derive novel learning error estimates. Our results rely on natural
assumptions on the spectral properties of the covariance operator associated to
the data distribu- tion, and hold for a wide class of metrics between
subspaces. As special cases, we discuss sharp error estimates for the
reconstruction properties of PCA and spectral support estimation. Key to our
analysis is an operator theoretic approach that has broad applicability to
spectral learning methods.Comment: Extendend Version of conference pape
Learning Manifolds with K-Means and K-Flats
We study the problem of estimating a manifold from random samples. In particular, we consider piecewise constant and piecewise linear estimators induced by k-means and k-flats, and analyze their performance. We extend previous results for k-means in two separate directions. First, we provide new results for k-means reconstruction on manifolds and, secondly, we prove reconstruction bounds for higher-order approximation (k-flats), for which no known results were previously available. While the results for k-means are novel, some of the technical tools are well-established in the literature. In the case of k-flats, both the results and the mathematical tools are new.