2 research outputs found
Authorship Attribution Using Word Network Features
In this paper, we explore a set of novel features for authorship attribution
of documents. These features are derived from a word network representation of
natural language text. As has been noted in previous studies, natural language
tends to show complex network structure at word level, with low degrees of
separation and scale-free (power law) degree distribution. There has also been
work on authorship attribution that incorporates ideas from complex networks.
The goal of our paper is to explore properties of these complex networks that
are suitable as features for machine-learning-based authorship attribution of
documents. We performed experiments on three different datasets, and obtained
promising results
Domain Independent Authorship Attribution without Domain Adaptation
Automatic authorship attribution, by its nature, is much more advantageous if it is domain (i.e., topic and/or genre) independent. That is, many real world problems that require authorship attribution may not have in-domain training data readily available. However, most previous work based on machine learning techniques focused only on in-domain text for authorship attribution. In this paper, we present comprehensive evaluation of various stylometric techniques for cross-domain authorship attribution. From the experiments based on the Project Gutenberg book archive, we discover that extremely simple techniques based on stopwords are surprisingly robust against domain change, essentially ridding the need for domain adaptation when supplied with a large amount of data.