Non-linear principal component analysis (approximation by a second-order Taylor series)

Abstract

Linear Principal Component Analysis (LPCA) has been applied in multivariate analysis because of its many optimality properties. However, when applied to locate singularities in a set of data, LPCA is only able to locate linear singularities. If the problem being considered tends to produce variables with non-linear relationships, such as with non-linear regression, LPCA is necessarily of limited utility in identifying singularities. Non-linear generalizations of PCA have been suggested in the literature. Essentially, these involve augmenting the data with higher-order terms, in particular square and cross product terms, and running a LPCA on the augmented data set. The problem with this approach is that the fundamental property of parsimony is violated because the number of principal components is greater than the number of original variates. Further, the dimensionality of the augmented data set increases quadratically with respect to the number of original variates. This greatly increases the computational load for practical-sized problems. A new method is proposed in this thesis. It involves writing the general non-linear model as a Taylor Series and truncating after the second-order terms. The data are centered about their means, and the square and cross-product values are computed. The covariance matrix for the augmented problem is computed. The non-linear singularities (or near singularities) are obtained by running a canonical regression using the \u27original and added sets as partitions. The advantage here is that no more principal components may be extracted here than there are variates in the original data set, so the principle of parsimony is upheld. The results of two computational experiments are discussed

    Similar works