30 research outputs found
Results of a Single Blind Literary Taste Test with Short Anonymized Novel Fragments
It is an open question to what extent perceptions of literary quality are
derived from text-intrinsic versus social factors. While supervised models can
predict literary quality ratings from textual factors quite successfully, as
shown in the Riddle of Literary Quality project (Koolen et al., 2020), this
does not prove that social factors are not important, nor can we assume that
readers make judgments on literary quality in the same way and based on the
same information as machine learning models. We report the results of a pilot
study to gauge the effect of textual features on literary ratings of
Dutch-language novels by participants in a controlled experiment with 48
participants. In an exploratory analysis, we compare the ratings to those from
the large reader survey of the Riddle in which social factors were not
excluded, and to machine learning predictions of those literary ratings. We
find moderate to strong correlations of questionnaire ratings with the survey
ratings, but the predictions are closer to the survey ratings. Code and data:
https://github.com/andreasvc/litquestComment: Accepted for LaTeCH 2020 @ COLIN
Identifying literary texts with bigrams
We study perceptions of literariness in a set of contemporary Dutch novels. Experiments with machine learning models show that it is possible to automatically distinguish novels that are seen as highly literary from those that are seen as less literary, using surprisingly simple textual features. The most discriminating features of our classification model indicate that genre might be a confounding factor, but a regression model shows that we can also explain variation between highly literary novels from less literary ones within genre
Results of a Single Blind Literary Taste Test with Short Anonymized Novel Fragments
It is an open question to what extent perceptions of literary quality are
derived from text-intrinsic versus social factors. While supervised models can
predict literary quality ratings from textual factors quite successfully, as
shown in the Riddle of Literary Quality project (Koolen et al., 2020), this
does not prove that social factors are not important, nor can we assume that
readers make judgments on literary quality in the same way and based on the
same information as machine learning models. We report the results of a pilot
study to gauge the effect of textual features on literary ratings of
Dutch-language novels by participants in a controlled experiment with 48
participants. In an exploratory analysis, we compare the ratings to those from
the large reader survey of the Riddle in which social factors were not
excluded, and to machine learning predictions of those literary ratings. We
find moderate to strong correlations of questionnaire ratings with the survey
ratings, but the predictions are closer to the survey ratings. Code and data:
https://github.com/andreasvc/litquestComment: Accepted for LaTeCH 2020 @ COLIN
These are not the Stereotypes You are Looking For:Bias and Fairness in Authorial Gender Attribution
Stylometric and text categorization results show that author gender can be discerned in texts with relatively high accuracy. However, it is difficult to explain what gives rise to these results and there are many possible confounding factors, such as the domain, genre, and target audience of a text. More fundamentally, such classification efforts risk invoking stereotyping and essentialism. We explore this issue in two datasets of Dutch literary novels, using commonly used descriptive (LIWC, topic modeling) and predictive (machine learning) methods. Our results show the importance of controlling for variables in the corpus and we argue for taking care not to overgeneralize from the results