6,626 research outputs found
Recommended from our members
Identifying idiolect in forensic authorship attribution: an n-gram textbite approach
Forensic authorship attribution is concerned with identifying authors of disputed or anonymous documents, which are potentially evidential in legal cases, through the analysis of linguistic clues left behind by writers. The forensic linguist âapproaches this problem of questioned authorship from the theoretical position that every native speaker has their own distinct and individual version of the language [. . . ], their own idiolectâ (Coulthard, 2004: 31). However, given the diXculty in empirically substantiating a theory of idiolect, there is growing concern in the Veld that it remains too abstract to be of practical use (Kredens, 2002; Grant, 2010; Turell, 2010). Stylistic, corpus, and computational approaches to text, however, are able to identify repeated collocational patterns, or n-grams, two to six word chunks of language, similar to the popular notion of soundbites: small segments of no more than a few seconds of speech that journalists are able to recognise as having news value and which characterise the important moments of talk. The soundbite oUers an intriguing parallel for authorship attribution studies, with the following question arising: looking at any set of texts by any author, is it possible to identify ân-gram textbitesâ, small textual segments that characterise that authorâs writing, providing DNA-like chunks of identifying material
ASAP: A Source Code Authorship Program
Source code authorship attribution is the task of determining who wrote a computer program, based on its source code, usually when the author is either unknown or under dispute. Areas where this can be applied include software forensics, cases of software copyright infringement, and detecting plagiarism. Numerous methods of source code authorship attribution have been proposed and studied. However, there are no known easily accessible and user-friendly programs that perform this task. Instead, researchers typically develop software in an ad hoc manner for use in their studies, and the software is rarely made publicly available. In this paper, we present a software tool called A Source Code Authorship Program (ASAP), which is suitable to be used by either the layperson or the expert. An author can be attributed to individual documents one at a time, or complex authorship attribution experiments can easily be performed on large datasets. In this paper, the interface and implementation of the ASAP tool is presented, and the tool is validated by using it to replicate previously published authorship attribution experiments
Drawing Elena Ferrante's Profile. Workshop Proceedings, Padova, 7 September 2017
Elena Ferrante is an internationally acclaimed Italian novelist whose real identity has been kept secret by E/O publishing house for more than 25 years. Owing to her popularity, major Italian and foreign newspapers have long tried to discover her real identity. However, only a few attempts have been made to foster a scientific debate on her work.
In 2016, Arjuna Tuzzi and Michele Cortelazzo led an Italian research team that conducted a preliminary study and collected a well-founded, large corpus of Italian novels comprising 150 works published in the last 30 years by 40 different authors. Moreover, they shared their data with a select group of international experts on authorship attribution, profiling, and analysis of textual data: Maciej Eder and Jan Rybicki (Poland), Patrick Juola (United States), Vittorio Loreto and his research team, Margherita Lalli and Francesca Tria (Italy), George Mikros (Greece), Pierre Ratinaud (France), and Jacques Savoy (Switzerland).
The chapters of this volume report the results of this endeavour that were first presented during the international workshop Drawing Elena Ferrante's Profile in Padua on 7 September 2017 as part of the 3rd IQLA-GIAT Summer School in Quantitative Analysis of Textual Data. The fascinating research findings suggest that Elena Ferrante\u2019s work definitely deserves \u201cmany hands\u201d as well as an extensive effort to understand her distinct writing style and the reasons for her worldwide success
Artificial Intelligence and Pattern Evidence: A Legal Application for AI
Artificial intelligence changes everything, and almost no jobs will be immune. The application of AI to the practice of law is well-known and well-understood. In this paper, we present some aspects of the related disciplines of forensic science and specifically the development and analysis of âpattern [and impression] evidence.â We show that pattern evidence has a great need for AI.We discuss several applications in detail but focus mostly on the application of AI-based text analysis technology to forensic linguistics.Sociedad Argentina de InformĂĄtica e InvestigaciĂłn Operativ
Approaching Questions of Text Reuse in Ancient Greek Using Computational Syntactic Stylometry
We are investigating methods by which data from dependency syntax treebanks of ancient Greek can be applied to questions of authorship in ancient Greek historiography. From the Ancient Greek Dependency Treebank were constructed syntax words (sWords) by tracing the shortest path from each leaf node to the root for each sentence tree. This paper presents the results of a preliminary test of the usefulness of the sWord as a stylometric discriminator. The sWord data was subjected to clustering analysis. The resultant groupings were in accord with traditional classifications. The use of sWords also allows a more fine-grained heuristic exploration of difficult questions of text reuse. A comparison of relative frequencies of sWords in the directly transmitted Polybius book 1 and the excerpted books 9â10 indicate that the measurements of the two texts are generally very close, but when frequencies do vary, the differences are surprisingly large. These differences reveal that a certain syntactic simplification is a salient characteristic of Polybiusâ excerptor, who leaves conspicuous syntactic indicators of his modifications
Approaching Questions of Text Reuse in Ancient Greek Using Computational Syntactic Stylometry
We are investigating methods by which data from dependency syntax treebanks of ancient Greek can be applied to questions of authorship in ancient Greek historiography. From the Ancient Greek Dependency Treebank were constructed syntax words (sWords) by tracing the shortest path from each leaf node to the root for each sentence tree. This paper presents the results of a preliminary test of the usefulness of the sWord as a stylometric discriminator. The sWord data was subjected to clustering analysis. The resultant groupings were in accord with traditional classifications. The use of sWords also allows a more fine-grained heuristic exploration of difficult questions of text reuse. A comparison of relative frequencies of sWords in the directly transmitted Polybius book 1 and the excerpted books 9â10 indicate that the measurements of the two texts are generally very close, but when frequencies do vary, the differences are surprisingly large. These differences reveal that a certain syntactic simplification is a salient characteristic of Polybiusâ excerptor, who leaves conspicuous syntactic indicators of his modifications
Visualization of Co-authorshipin DIT Arrow
With the popularization of information technology and the unprecedented development of online reading, the management and service of the library are facing severe challenges; the traditional library operation mode has been challenging to optimize the service. At the same time, there is also a fatal impact on library collection and systematic management, however, with the development of visualization techniques in management and service, the library can alleviate the eïŹect of the current network information basically, which achieves the intellectual development of library ïŹeld. This study empirically provides the evidence to indicate that the force directed layout has the statistically signiïŹcant performance than the radial layout for visualization of co-authorship in DIT Arrow repository based on the results of surveys
- âŠ