Gender in Shakespeare: Automatic Stylistics Gender Character Classification Using Syntactic, Lexical and Lemma Features

Abstract

For a variety of text types, methods for automatically determining the gender of a document’s author can now reliably achieve accuracy of at least 70-80%. Our aim here is to extend this research, to examine determining the gender of literary characters from the author’s differing word use between characters of different genders. Here we describe results showing how Shakespeare used language differently for his male and female characters, and we have studied the top discriminating features from characters of both genders. We used Sequential Minimal Optimization (SMO) to classify of gender character, based on various lexical and syntactic features to analyze the language Shakespeare used for gendering characters. Our methods achieve classification accuracy as high as 82 % for classifying character gender. We further observe several interesting patterns in the most distinguishing features, including the fact that some constellations of features match well to previous reports of features that distinguish between male and female authors.

    Similar works

    Full text

    thumbnail-image

    Available Versions