Comparison of Latent Semantic Analysis and Probabilistic Latent Semantic Analysis for Documents Clustering

Kitowski, Jacek; Kuta, Marcin

Comparison of Latent Semantic Analysis and Probabilistic Latent Semantic Analysis for Documents Clustering

Authors: Jacek Kitowski
Marcin Kuta
Publication date: 4 February 2015
Publisher: Institute of Informatics, Slovak Academy of Sciences

Abstract

In this paper we compare usefulness of statistical techniques of dimensionality reduction for improving clustering of documents in Polish. We start with partitional and agglomerative algorithms applied to Vector Space Model. Then we investigate two transformations: Latent Semantic Analysis and Probabilistic Latent Semantic Analysis. The obtained results showed advantage of Latent Semantic Analysis technique over probabilistic model. We also analyse time and memory consumption aspects of these transformations and present runtime details for IBM BladeCenter HS21 machine

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

oai:ojs.cai.ui.sav.sk:article/...

Last time updated on 15/12/2019