Skip to main content
Article thumbnail
Location of Repository

(x − µ)

By Alex Lorbert and Gungor Polatkan


The goal of dimensionality reduction is to compute a reduced representation of our data. The benefits of such a reduction include visualization of data, storage of data, and the possible extraction of systematic structures. In general, if we have a p-dimensional vector (x1, x2,..., xp) we wish to find a way to represent this vector with q-dimensions as (˜x1, ˜x2,..., ˜xq) with p> q. In this lecture we assume only real valued vectors. 2. PRINCIPAL COMPONENT ANALYSIS (PCA) The main idea of PCA is to project our data to a lower dimensional manifold. For example, if p = 2 and our data “seem ” linear (q = 1) then we wish to project the data points onto a “suitable ” line (see Figure 1). This projection is not without cost since our data do not really live on a line. In PCA our free parameter is the selection of q. There are at least three ways to think about our lower dimensional subspace: (1) We can maximize the variance of the projection along R q [1]. In the previous example, a selection of a horizontal line results in the projected data points being “squashed”. (2) We can minimize the reconstruction error, i.e. the distance between the the original data and the projected data [2]. [Note: this is not the same as regression where we minimize the RSS]. (3) We can view PCA via an MLE of a parameter in a latent variable model [3]. 3. THE MULTIVARIATE GAUSSIAN DISTRIBUTION The probability density function of a Gaussian random vector X ∈ Rp is 1 p (x|µ, Σ) = (2π) p/2 { exp

Year: 2011
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.