Search CORE

15 research outputs found

A Framework for Stylometric Similarity Detection in Online Settings

Author: Abbasi Ahmed
Chen Hsinchun
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2007
Field of study

Approaching Questions of Text Reuse in Ancient Greek Using Computational Syntactic Stylometry

Author: Gorman Robert J.
Gorman Vanessa
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2016
Field of study

We are investigating methods by which data from dependency syntax treebanks of ancient Greek can be applied to questions of authorship in ancient Greek historiography. From the Ancient Greek Dependency Treebank were constructed syntax words (sWords) by tracing the shortest path from each leaf node to the root for each sentence tree. This paper presents the results of a preliminary test of the usefulness of the sWord as a stylometric discriminator. The sWord data was subjected to clustering analysis. The resultant groupings were in accord with traditional classifications. The use of sWords also allows a more fine-grained heuristic exploration of difficult questions of text reuse. A comparison of relative frequencies of sWords in the directly transmitted Polybius book 1 and the excerpted books 9–10 indicate that the measurements of the two texts are generally very close, but when frequencies do vary, the differences are surprisingly large. These differences reveal that a certain syntactic simplification is a salient characteristic of Polybius’ excerptor, who leaves conspicuous syntactic indicators of his modifications

Crossref

DigitalCommons@University of Nebraska

Directory of Open Access Journals

Approaching Questions of Text Reuse in Ancient Greek Using Computational Syntactic Stylometry

Author: Gorman Robert J.
Gorman Vanessa
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2016
Field of study

Stylometric analysis of the correspondence of Zsigmond Móricz

Author: Cséve Anna
Kalcsó Gyula
Mihály Eszter
Publication venue: 'Acta Universitatis de Carolo Eszterhay Nominatae. Sectio linguistica Hungarica'
Publication date
Field of study

Móricz Zsigmond levelezésének stilometriai elemzése ----- Jelen cikk egy kutatásról számol be, amelynek keretében számítógépes stilometriai módszerekkel vizsgáltuk meg Móricz Zsigmond feleségéhez és másokhoz 1902 és 1913 között írt leveleinek textuális és stilometriai sajátosságait. Ez a kísérlet a Petőfi Irodalmi Múzeum Digitális Bölcsészeti Központjának az első stilometriai próbálkozása. A korpusz a Petőfi Irodalmi Múzeum Móricz-különgyűjteményének leveleiből készült digitális tudományos kiadásán alapul, 478 levelet (220 268 szót) tartalmaz. Egy R-csomagot, a Stylót, valamint távolságmérési módszereket (klasszikus deltát és Eder egyszerű deltáját) alkalmaztunk a fent említett sajátosságok elemzésére. Az eredményeket kétféleképpen vizualizáltuk: klaszteranalízissel (dendrogramon) és főkomponens-analízissel. A levelek klasszifikációja sikeres volt, bár csak a két vizualizációs módszer együttes alkalmazása vezetett eredményre. Sikerült kimutatnunk, hogy stilometriailag mérhető különbségek vannak a Jankának és másoknak írt Móricz-levelek között

EKE Repository of Publications

Segmenting documents by stylistic character

Author: BHASKARA MARTHI
GRAEME HIRST
NEIL GRAHAM
Publication venue: 'Cambridge University Press (CUP)'
Publication date
Field of study

Crossref

Attributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm

Author: Emily Franzini
Emily Franzini
Gabriela Rotari
Greta Franzini
Jan Rybicki
Jeremi K. Ochab
Joanna Byszuk
Melina Jander
Mike Kestemont
Publication venue: 'Modern Language Association'
Publication date: 01/01/2018
Field of study

This article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition), reporting on the effect this noise has on the analyses necessary to computationally identify the different writing style of the two brothers. In summary, our findings show that OCR digitization serves as a reliable proxy for the more painstaking process of manual digitization, at least when it comes to authorship attribution. Our results suggest that attribution is viable even when using training and test sets from different digitization pipelines. With regards to HTR, this research demonstrates that even though automated transcription significantly increases the risk of text misclassification when compared to OCR, a cleanliness above ≈ 20% is already sufficient to achieve a higher-than-chance probability of correct binary attribution

Crossref

PubliCatt

Directory of Open Access Journals

Frontiers - Publisher Connector

Humanities Commons

Institutional Repository Universiteit Antwerpen

Jagiellonian Univeristy Repository