2 research outputs found

    NMF based dimension reduction methods for Turkish text clustering

    No full text
    Conference: IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA) -- Location: BULGARIA -- Date: JUN 19-21, 2013In this work, we analyze the effects of NMF based dimension reduction methods on clustering of Turkish documents by using k-means clustering algorithm. All experiments are conducted on two different datasets that we call Milliyet4c1k and 1150haber. The NMF based dimension reduction methods have two purposes: to reduce the original vector space by transformation and to reduce size and dimension by summarizing original documents. Experimental results show that NMF transformation yields to better clustering results on both datasets. Using k-means on summarized documents produces almost identical result with k-means on original documents. Although using summaries instead of full documents doesn't improve quality of clustering, we show that it significantly reduces the size of the processed data and execution time of k-means clustering algorithm.IEEE; Bulgarian Sci Acad; Bulgarian Acad Sci, Inst Informat & Commun Technologies; IEEE Bulgarian Sectio

    Unsupervised and supervised term weigthing methods for character n-gram based author categorization

    No full text
    Naiboğlu, H. Selahattin (Dogus Author) -- Kaptıkaçtı, Oğuz (Dogus Author) -- Sardal, E. Cemre (Dogus Author) -- Güran, Aysun (Dogus Author) -- Uysal, Mitat (Dogus Author) -- Conference full title: Joint International Symposium on "The Social Impacts of Developments in Information, Manufacturing and Service Systems" 44th International Conference on Computers and Industrial Engineering, CIE 2014 and 9th International Symposium on Intelligent Manufacturing and Service Systems, IMSS 2014; Adile Sultan Palace Istanbul; Turkey; 14 October 2014 through 16 October 2014Author categorization considers the problem of identifying the author of an anonymous article. The goal of this work is to identify authors of articles by using different character n-gram based representations of documents. The use of character n-gram models is a relatively simple idea, but it turns out to be quite effective in many applications. The most important point in n-gram based methods is how to represent the documents. In this study, several widely used unsupervised and supervised n-gram weighting methods are investigated on a Turkish data corpus in combination with different classification algorithms. Apart from this, the character n-gram based features are compared with some stylistic markers and the evaluation results are shared in detail.Computer and Industrial Engineering, Gaziantep University, Istanbul Commercial University, Journal of Intelligent Manufacturing Systems, Sakarya University, Department of Industrial Engineering
    corecore