Skip to main content
Article thumbnail
Location of Repository

Phraseological fingerprints: using habitual wordings to aid authorship attribution

By Andreas Buerki


Research into formulaic language suggests that habitual ways of putting things mark speakers and writers out as belonging to certain speech (sub-)communities, such as academic vs. non-academic communities, L1 vs. L2 speaker communities or speech communities at different points in time. It has further been shown that individuals' phraseological habits can be distinctive enough to mark them out individually, such as when the expression I entirely understand is or was characteristic of Tony Blair (Mollin 2009). Interest has recently grown in bringing this observation to bear on the task of authorship attribution for disputed texts in forensic settings (Johnson and Wright 2014; Larner 2013), but the typical limits on the number available texts and their often short lengths has proven a significant hurdle. This study presents and evaluates a number of approaches to exploiting phraseological choices to aid authorship attribution, from identifying distinctive phraseological sequences through linguistically informed close reading to automatic, n-gram-based techniques derived from work on information retrieval. Based on a corpus of multiple short texts (less than 280 words in length) by each of a small sample of individuals, it is shown how phraseological indicators on their own, as well as in conjunction with other authorship markers, can be used to successfully identify authors even on the basis of a limited number of short texts.\ud \ud References\ud \ud Johnson, A., & Wright, D. (2014). Identifying idiolect in forensic authorship attribution: An n-gram textbite approach. Language and Law, 1(1), 37-69.\ud Larner, S. (2014). A preliminary investigation into the use of fixed formulaic sequences as a marker of authorship. IJSLL, 21(1).\ud Mollin, S. (2009). "I entirely understand" is a Blairism: The methodology of identifying idiolectal collocations. International Journal of Corpus Linguistics, 14(3), 367-392

Topics: P1
OAI identifier: oai:
Sorry, our data provider has not provided any external links therefore we are unable to provide a link to the full text.

Suggested articles

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.