5 research outputs found
Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly
When confronted with massive data streams, summarizing data with dimension
reduction methods such as PCA raises theoretical and algorithmic pitfalls.
Principal curves act as a nonlinear generalization of PCA and the present paper
proposes a novel algorithm to automatically and sequentially learn principal
curves from data streams. We show that our procedure is supported by regret
bounds with optimal sublinear remainder terms. A greedy local search
implementation (called \texttt{slpc}, for Sequential Learning Principal Curves)
that incorporates both sleeping experts and multi-armed bandit ingredients is
presented, along with its regret computation and performance on synthetic and
real-life data
On principal curves with a length constraint
Principal curves are defined as parametric curves passing through the ``middle'' of a probability distribution in R^d. In addition to the original definition based on self-consistency, several points of view have been considered among which a least square type constrained minimization problem.In this paper, we are interested in theoretical properties satisfied by a constrained principal curve associated to a probability distribution with second-order moment. We study open and closed principal curves f:[0,1]-->R^d with length at most L and show in particular that they have finite curvature whenever the probability distribution is not supported on the range of a curve with length L.We derive from the order 1 condition, expressing that a curve is a critical point for the criterion, an equation involving the curve, its curvature, as well as a random variable playing the role of the curve parameter. This equation allows to show that a constrained principal curve in dimension 2 has no multiple point
Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly
When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. Principal curves act as a nonlinear generalization of PCA and the present paper proposes a novel algorithm to automatically and sequentially learn principal curves from data streams. We show that our procedure is supported by regret bounds with optimal sublinear remainder terms. A greedy local search implementation (called \texttt{slpc}, for Sequential Learning Principal Curves) that incorporates both sleeping experts and multi-armed bandit ingredients is presented, along with its regret computation and performance on synthetic and real-life data