5 research outputs found
Gender in Shakespeare: Automatic Stylistics Gender Character Classification Using Syntactic, Lexical and Lemma Features
For a variety of text types, methods for automatically determining the gender of a document’s author can now reliably achieve accuracy of at least 70-80%. Our aim here is to extend this research, to examine determining the gender of literary characters from the author’s differing word use between characters of different genders. Here we describe results showing how Shakespeare used language differently for his male and female characters, and we have studied the top discriminating features from characters of both genders. We used Sequential Minimal Optimization (SMO) to classify of gender character, based on various lexical and syntactic features to analyze the language Shakespeare used for gendering characters. Our methods achieve classification accuracy as high as 82 % for classifying character gender. We further observe several interesting patterns in the most distinguishing features, including the fact that some constellations of features match well to previous reports of features that distinguish between male and female authors.
Stylistic text classification using functional lexical features
Most text analysis and retrieval work to date has focused on determining the topic of a text, what it is about. However, a text also contains much useful information in its style, or how it is written. This includes information about its author, its purpose, feelings it is meant to evoke, and more. This paper addresses the problem of classifying texts by style (along several different dimensions), developing a new type of lexical feature based on taxonomies of various semantic functions of different lexical items (words or phrases). We show the usefulness of such features for text classification by author, author personality, gender of literary characters, sentiment (positive/negative feeling), and scientific rhetorical styles. We further show how the use of such functional features aids in gaining insight about stylistic differences between texts. ∗ Casey Whitelaw was a visiting scholar at the IIT Linguistic Cognition Laboratory during November 2004. 1