Search CORE

25 research outputs found

Authorship Attribution: Using Rich Linguistic Features when Training Data is Scarce.

Author: Calderone Basilio
Hathout Nabil
Sajous Franck
Tanguy Ludovic
Publication venue: HAL CCSD
Publication date: 17/09/2012
Field of study

International audienceWe describe here the technical details of our participation to PAN 2012's "traditional" authorship attribution tasks. The main originality of our approach lies in the use of a large quantity of varied features to represent textual data, processed by a maximum entropy machine learning tool. Most of these features make an intensive use of natural language processing annotation techniques as well as generic language resources such as lexicons and other linguistic databases. Some of the features were even designed specifically for the target data type (contemporary fiction). Our belief is that richer features, that integrate external knowledge about language, have an advantage over knowledge-poorer ones (such as words and character n-grams frequencies) when training data is scarce (both in raw volume and number of training items for each target author). Although overall results were average (66% accuracy over the main tasks for the best run), we will focus in this paper on the differences between feature sets. If the "rich" linguistic features have proven to be better than trigrams of characters and word frequencies, the most efficient features vary widely from task to task. For the intrusive paragraphs tasks, we got better results (73 and 93%) while still using the maximum entropy engine as an unsupervised clustering tool

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Monolingual Plagiarism Detection and Paraphrase Type Identification

Author: Alvi Faisal
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/08/2020
Field of study

White Rose E-theses Online

The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files

Author: Hendrikse Steven
Publication venue: NSUWorks
Publication date: 01/01/2017
Field of study

In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated

NSU Works

Geographic information extraction from texts

Author: Hu Xuke
Hu Yingjie
Kersten Jens
Resch Bernd
Publication venue
Publication date: 05/12/2023
Field of study

A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

Institute of Transport Research:Publications

Recommended from our members

Empowering passivity in H.D.’s Madrigal cycle novels

Author: Zorluoğlu Emel
Publication venue
Publication date: 01/09/2017
Field of study

My thesis re-situates the work of modernist writer, Hilda Doolittle (H.D., 1886–1961) at the intersection of modernism, psychoanalysis, spirituality and passivity. Although H.D. is often claimed to be a feminist writer, there are very few active expressions of feminist anger in her work. Instead, we might turn to psychoanalytic discussions to consider where the anger resides in H.D. Melanie Klein argues that aggression is an innate instinct and art is a means of sublimating that instinct. For H.D, a bisexual mother who experiences war trauma, betrayal, death, stillbirth and breakdown, aggression and anger become a form of artistic energy that allows her to create herself anew. In a sense, her pain and suffering are transformed into an embedded anger that later becomes H.D.’s catalyst to write. I argue that not writing in explicit anger was a deliberate choice, for H.D. yearned to destroy the dichotomies she faced, not to reverse them. To do this, and still reflect her anger, she adopts an unusual passive-aggressive writing strategy. Though passivity might seem like a negative rather than a positive trait to feminist readers, I seek to demonstrate that H.D. manages to extract power from passivity. I explore through Kleinian psychoanalysis the ways in which H.D.’s writing relates to power and passivity and, importantly, to H.D.’s Moravian ancestors, who were, simultaneously, ‘gladly passive’ and powerful. Whilst appearing passive, these narrative strategies also hold the power that H.D. values. As such, Moravian ways of dealing with aggression contribute to the passive-aggressive writing methods that H.D. adopts, such as the roman à clef and palimpsest. In subsequent chapters on Asphodel and Hermione, I reflect on how these two novels represent a place for her to emerge as a powerful voice

Sussex Research Online

Psychotropes: Models of Authorship, Psychopathology, and Molecular Politics in Aldous Huxley and Philip K. Dick

Author: Rudge Chris John
Publication venue: Faculty of Arts and Social Sciences, School of Letters, Art and Media
Publication date: 01/01/2015
Field of study

Among the so-called “anti-psychiatrists” of the 1960s and ‘70s, it was Félix Guattari who first identified that psychiatry had undergone a “molecular revolution.” It was in fact in a book titled Molecular Revolutions, published in 1984, that Guattari proposed that psychotherapy had become, in the de¬cades following the Second World War, far less personal and increasingly alienating. The newly “molecular” practices of psychiatry, Guattari mourned, had served only to fundamentally distance both patients and practitioners from their own minds; they had largely restricted our access, he suggested, to human subjectivity and consciousness. This thesis resumes Guattari’s work on the “molecular” model of the subject. Extending on Guattari’s various “schizoanalytic metamodels” of hu¬man consciousness and ontology, it rigorously meditates on a simple ques¬tion: Should we now accept the likely finding that there is no neat, singular, reductive, utilitarian, or unifying “model” for thinking about the human subject, and more specifically the human “author”? Part 1 of this thesis carefully examines a range of psychoanalytic, psychi¬atric, philosophical, and biomedical models of the human. It studies and re¬formulates each of them in turn and, all the while, returns to a fundamental position: that no single model, nor combination of them, will suffice. What part 1 seeks to demonstrate, then, is that envisioning these models as differ¬ent attempts to “know” the human is fruitless—a futile game. Instead, these models should be understood in much the same way as literary critics treat literary commonplaces or topoi; they are akin, I argue, to what Deleuze and Guattari called “images of thought.” In my terminology, they are “psycho¬tropes”: images with their own particular symbolic and mythical functions. Having thus developed a range of theoretical footholds in part 1, part 2 of the thesis—beginning in chapter 4—will put into practice the work of this first part. It will do so by examining various representations of authorship by two authors in particular: Aldous Huxley and Philip K. Dick. This part will thus demonstrate how these author figures function as “psychoactive scriv¬eners”: they are fictionalising philosophers who both produce and quarrel with an array of paradigmatic psychotropes, disputing those of others and inventing their own to substitute for them. More than this, however, the second part offers a range of detailed and original readings of these authors’s psychobiographies; it argues that even individual authors such as Huxley and Dick can be seen as “psychotropic.” It offers, that is, a series of broad-ranging and speculative explanations for the ideas and themes that appear in their works—explanations rooted in the theoretical work of the first part. Finally, this thesis concludes by reaffirming the importance of these authors’s narcoliteratures—both for present-day and future literary studies, and beyond. For while Huxley and Dick allow us to countenance afresh the range of failures in the history and philosophy of science, they also prom¬ise to instruct us—and instruct science—about the ways in which we might move beyond our received mimetic models of the human

Sydney eScholarship

Tune your brown clustering, please

Author: Bøgh K.S.
Chester S.
Derczynski L.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal

White Rose Research Online

Exploring Written Artefacts

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 11/01/2022
Field of study

This collection, presented to Michael Friedrich in honour of his academic career at of the Centre for the Study of Manuscript Cultures, traces key concepts that scholars associated with the Centre have developed and refined for the systematic study of manuscript cultures. At the same time, the contributions showcase the possibilities of expanding the traditional subject of ‘manuscripts’ to the larger perspective of ‘written artefacts’

Directory of Open Access Books (DOAB)

World Beats

Author: Fazzino Jimmy
Publication venue: 'OAPEN Foundation'
Publication date: 01/04/2020
Field of study

This fascinating book explores Beat Generation writing from a transnational perspective, using the concept of worlding to place Beat literature in conversation with a far-reaching network of cultural and political formations. Countering the charge that the Beats abroad were at best naïve tourists seeking exoticism for exoticism's sake, World Beats finds that these writers propelled a highly politicized agenda that sought to use the tools of the earlier avant-garde to undermine Cold War and postcolonial ideologies and offer a new vision of engaged literature. With fresh interpretations of central Beat authors Jack Kerouac, Allen Ginsberg, and William Burroughs - as well as usually marginalized writers like Philip Lamantia, Ted Joans, and Brion Gysin - World Beats moves beyond national, continental, or hemispheric frames to show that embedded within Beat writing is an essential universality that brought America to the world and the world to American literature

Directory of Open Access Books (DOAB)

Keys to The Gift

Author: Leving Yuri
Publication venue: 'JSTOR'
Publication date: 01/04/2020
Field of study

"Yuri Leving’s Keys to The Gift: A Guide to Vladimir Nabokov’s Novel is a new systematization of the main available data on Nabokov’s most complex Russian novel, The Gift (1934–1939). From notes in Nabokov’s private correspondence to scholarly articles accumulated during the seventy years since the novel’s first appearance in print, this work draws from a broad spectrum of existing material in a succinct and coherent way and provides innovative analyses. The first part of the monograph, “The Novel,” outlines the basic properties of The Gift (plot, characters, style, and motifs) and reconstructs its internal chronology. The second part, “The Text,” describes the creation of the novel and the history of its publication, public and critical reaction, challenges of English translation, and post-Soviet reception. Along with annotations to all five chapters of The Gift, the commentary provides insight into problems of paleography, featuring a unique textological analysis of the novel

Directory of Open Access Books (DOAB)