Location of Repository

The goal of dimensionality reduction is to compute a reduced representation of our data. The benefits of such a reduction include visualization of data, storage of data, and the possible extraction of systematic structures. In general, if we have a p-dimensional vector (x1, x2,..., xp) we wish to find a way to represent this vector with q-dimensions as (˜x1, ˜x2,..., ˜xq) with p> q. In this lecture we assume only real valued vectors. 2. PRINCIPAL COMPONENT ANALYSIS (PCA) The main idea of PCA is to project our data to a lower dimensional manifold. For example, if p = 2 and our data “seem ” linear (q = 1) then we wish to project the data points onto a “suitable ” line (see Figure 1). This projection is not without cost since our data do not really live on a line. In PCA our free parameter is the selection of q. There are at least three ways to think about our lower dimensional subspace: (1) We can maximize the variance of the projection along R q [1]. In the previous example, a selection of a horizontal line results in the projected data points being “squashed”. (2) We can minimize the reconstruction error, i.e. the distance between the the original data and the projected data [2]. [Note: this is not the same as regression where we minimize the RSS]. (3) We can view PCA via an MLE of a parameter in a latent variable model [3]. 3. THE MULTIVARIATE GAUSSIAN DISTRIBUTION The probability density function of a Gaussian random vector X ∈ Rp is 1 p (x|µ, Σ) = (2π) p/2 { exp

Year: 2011

OAI identifier:
oai:CiteSeerX.psu:10.1.1.186.3205

Provided by:
CiteSeerX

Download PDF: