We revisit the problem of finding principal components to the multivariate
datasets, that lie on an embedded nonlinear Riemannian manifold within the
higher-dimensional space. Our aim is to extend the geometric interpretation of
PCA, while being able to capture the non-geodesic form of variation in the
data. We introduce the concept of a principal sub-manifold, a manifold passing
through the center of the data, and at any point on the manifold, it moves in
the direction of the highest curvature in the space spanned by eigenvectors of
the local tangent space PCA. Compared to the recent work in the case where the
sub-manifold is of dimension one (Panaretos, Pham and Yao 2014)--essentially a
curve lying on the manifold attempting to capture the one-dimensional
variation--the current setting is much more general. The principal sub-manifold
is therefore an extension of the principal flow, accommodating to capture the
higher dimensional variation in the data. We show the principal sub-manifold
yields the usual principal components in Euclidean space. By means of examples,
we illustrate how to find, use and interpret principal sub-manifold with an
extension of using it in shape analysis