Going Deeper than Supervised Discretisation in Processing of Stylometric Features

Abstract

Rough set theory is employed in cases where data are incomplete and inconsistent and an ap- proximation of concepts is needed. The classical approach works for discrete data and allows only nominal classification. To induce the best rules, access to all available information is ad- vantageous, which can be endangered if discretisation is a necessary step in the data preparation stage. Discretisation, even executed with taking into account class labels of instances, brings some information loss. The research methodology illustrated in this paper is dedicated to ex- tended transformations of continuous input features into categorical, with the goal of enhancing the performance of rule-based classifiers, constructed with rough set data mining. The experi- ments were carried out in the stylometry domain, with its key task of authorship attribution. The obtained results indicate that supporting supervised discretisation with elements of unsuper- vised transformations can lead to enhanced predictions, which shows the merits of the proposed research framework

    Similar works