Independent components in text

Abstract

In this communication we analyze the feasibility of independent component analysis (ICA) for dimensional reduction and representation of word histograms. The analysis is carried out in a likelihood framework which allows estimates of the loadings (source signals), the mixing matrix and the noise level. In the face of noisy signals, the estimated sources are non-linear functionals of the observed signals, in contrast to the linear noise free case. We also discuss the generalizability of the estimated models and show that an empirical test error estimate may be used to optimize model dimensionality, in particular the optimal number of sources. When applied to word histograms ICA is shown to produce representations that are better aligned with the group structure in the text data than the LSA.

Similar works

Full text

thumbnail-image

CiteSeerX

redirect
Last time updated on 22/10/2014

This paper was published in CiteSeerX.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.