Search CORE

1 research outputs found

A sequence based dynamic SOM model for text clustering

Author: Alahakoon Damminda
Gunasinghe Upuli
Matharage Sumith
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Text clustering can be considered as a four step process consisting of feature extraction, text representation, document clustering and cluster interpretation. Most text clustering models consider text as an unordered collection of words. However the semantics of text would be better captured if word sequences are taken into account. In this paper we propose a sequence based text clustering model where four novel sequence based components are introduced in each of the four steps in the text clustering process. Experiments conducted on the Reuters dataset and Sydney Morning Herald (SMH) news archives demonstrate the advantage of the proposed sequence based model, in terms of capturing context with semantics, accuracy and speed, compared to clustering of documents based on single words and n-gram based models

Deakin Research Online