In order to avoid the curse of dimensionality, frequently encountered in Big
Data analysis, there was a vast development in the field of linear and
nonlinear dimension reduction techniques in recent years. These techniques
(sometimes referred to as manifold learning) assume that the scattered input
data is lying on a lower dimensional manifold, thus the high dimensionality
problem can be overcome by learning the lower dimensionality behavior. However,
in real life applications, data is often very noisy. In this work, we propose a
method to approximate M a d-dimensional Cm+1 smooth
submanifold of Rn (d≪n) based upon noisy scattered data
points (i.e., a data cloud). We assume that the data points are located "near"
the lower dimensional manifold and suggest a non-linear moving least-squares
projection on an approximating d-dimensional manifold. Under some mild
assumptions, the resulting approximant is shown to be infinitely smooth and of
high approximation order (i.e., O(hm+1), where h is the fill distance
and m is the degree of the local polynomial approximation). The method
presented here assumes no analytic knowledge of the approximated manifold and
the approximation algorithm is linear in the large dimension n. Furthermore,
the approximating manifold can serve as a framework to perform operations
directly on the high dimensional data in a computationally efficient manner.
This way, the preparatory step of dimension reduction, which induces
distortions to the data, can be avoided altogether