Abstract—This paper presents the design of a system for feature extraction and classification of news articles from Croatian news sources. An overview of supervised and unsupervised text classification and clustering machine learning techniques is presented. The techniques described are those most widely used for text classification tasks. The paper discusses a number of issues particular to text classification of the news source material, from its collection and organization to particular problems related to the evaluation of method correctness and categorization efficiency on Croatian news documents. Uses of these techniques are discussed and a proposal for their quantitative evaluation over a newly developed testing news corpus is proposed. Index Terms—classification, clustering, Croatian news articles, machine learning, supervised learning, unsupervised learnin
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.