Unsupervised and supervised term weigthing methods for character n-gram based author categorization

Abstract

Naiboğlu, H. Selahattin (Dogus Author) -- Kaptıkaçtı, Oğuz (Dogus Author) -- Sardal, E. Cemre (Dogus Author) -- Güran, Aysun (Dogus Author) -- Uysal, Mitat (Dogus Author) -- Conference full title: Joint International Symposium on "The Social Impacts of Developments in Information, Manufacturing and Service Systems" 44th International Conference on Computers and Industrial Engineering, CIE 2014 and 9th International Symposium on Intelligent Manufacturing and Service Systems, IMSS 2014; Adile Sultan Palace Istanbul; Turkey; 14 October 2014 through 16 October 2014Author categorization considers the problem of identifying the author of an anonymous article. The goal of this work is to identify authors of articles by using different character n-gram based representations of documents. The use of character n-gram models is a relatively simple idea, but it turns out to be quite effective in many applications. The most important point in n-gram based methods is how to represent the documents. In this study, several widely used unsupervised and supervised n-gram weighting methods are investigated on a Turkish data corpus in combination with different classification algorithms. Apart from this, the character n-gram based features are compared with some stylistic markers and the evaluation results are shared in detail.Computer and Industrial Engineering, Gaziantep University, Istanbul Commercial University, Journal of Intelligent Manufacturing Systems, Sakarya University, Department of Industrial Engineering

    Similar works