Location of Repository

1 Feature Extraction and Clustering of Croatian News Sources

By Boris Debić

Abstract

Abstract—This paper presents the design of a system for feature extraction and classification of news articles from Croatian news sources. An overview of supervised and unsupervised text classification and clustering machine learning techniques is presented. The techniques described are those most widely used for text classification tasks. The paper discusses a number of issues particular to text classification of the news source material, from its collection and organization to particular problems related to the evaluation of method correctness and categorization efficiency on Croatian news documents. Uses of these techniques are discussed and a proposal for their quantitative evaluation over a newly developed testing news corpus is proposed. Index Terms—classification, clustering, Croatian news articles, machine learning, supervised learning, unsupervised learnin

Year: 2010
OAI identifier: oai:CiteSeerX.psu:10.1.1.180.215
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.fer.hr/_download/re... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.