11 research outputs found
Style Obfuscation by Invariance
The task of obfuscating writing style using sequence models has previously
been investigated under the framework of obfuscation-by-transfer, where the
input text is explicitly rewritten in another style. These approaches also
often lead to major alterations to the semantic content of the input. In this
work, we propose obfuscation-by-invariance, and investigate to what extent
models trained to be explicitly style-invariant preserve semantics. We evaluate
our architectures on parallel and non-parallel corpora, and compare automatic
and human evaluations on the obfuscated sentences. Our experiments show that
style classifier performance can be reduced to chance level, whilst the
automatic evaluation of the output is seemingly equal to models applying
style-transfer. However, based on human evaluation we demonstrate a trade-off
between the level of obfuscation and the observed quality of the output in
terms of meaning preservation and grammaticality.Comment: Accepted for presentation at COLING1
Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling
Written language contains stylistic cues that can be exploited to
automatically infer a variety of potentially sensitive author information.
Adversarial stylometry intends to attack such models by rewriting an author's
text. Our research proposes several components to facilitate deployment of
these adversarial attacks in the wild, where neither data nor target models are
accessible. We introduce a transformer-based extension of a lexical replacement
attack, and show it achieves high transferability when trained on a weakly
labeled corpus -- decreasing target model performance below chance. While not
completely inconspicuous, our more successful attacks also prove notably less
detectable by humans. Our framework therefore provides a promising direction
for future privacy-preserving adversarial attacks.Comment: Accepted to EACL 202
Authorship Attribution Through Words Surrounding Named Entities
In text analysis, authorship attribution occurs in a variety of ways. The field of computational linguistics becomes more important as the need of authorship attribution and text analysis becomes more widespread. For this research, pre-existing authorship attribution software, Java Graphical Authorship Attribution Program (JGAAP), implements a named entity recognizer, specifically the Stanford Named Entity Recognizer, to probe into similar genre text and to aid in extricating the correct author. This research specifically examines the words authors use around named entities in order to test the ability of these words at attributing authorshi
Aplicación de estilometría para la atribución autorías en e-mails y documentos informáticos
Stylometry is the analysis by which authorship of a written text can be determined, analyzing special features that are unconsciously placed by a writer in his publications.
In this integrative and research paper an application with the ability to extract various features of writing is presented. These features are compared against another profile, composed by features, in order to obtain a similarity percentage between two styles of composition that may be from the same or different authors.
This has been achieved by incorporating several selected features that are considered relevant in order to perform an stylometric analysis, which include statistical observations about the pattern presented in the document, without setting aside the fact that it is applied to the Spanish language.La Estilometría es el análisis por el cual se puede determinar la autoría de un texto, que incluye el estudio de rasgos propios que utiliza un escritor al redactar documentos.
En este trabajo de investigación e integración se presenta un programa con la capacidad de extraer diversos rasgos característicos de escritura, los mismos que son comparados contra otro tipo de redacción con la finalidad de obtener un porcentaje de similitud entre estilos diferentes de composición manejados por uno o varios autores específicos. Esto se ha logrado mediante incorporación de varios parámetros que son considerados relevantes en el momento de realizar un análisis, los que incluyen a observaciones estadísticas sobre componentes léxicos, sintácticos, semánticos y estructurales aplicados al español
Analyzing Stylometric Approaches to Author Obfuscation
Part 2: FORENSIC TECHNIQUESInternational audienceAuthorship attribution is an important and emerging security tool. However, just as criminals may wear gloves to hide their fingerprints, so too may criminal authors mask their writing styles to escape detection. Most authorship studies have focused on cooperative and/or unaware authors who do not take such precautions. This paper analyzes the methods implemented in the Java Graphical Authorship Attribution Program (JGAAP) against essays in the Brennan-Greenstadt obfuscation corpus that were written in deliberate attempts to mask style. The results demonstrate that many of the more robust and accurate methods implemented in JGAAP are effective in the presence of active deception
AIUCD2017 - Book of Abstracts
Questo volume raccoglie gli abstract degli interventi presentati alla conferenza AIUCD 2017.
AIUCD 2017 si è svolta dal 26 al 28 Gennaio 2017 a Roma, ed è stata verrà organizzata dal Digilab,
Università Sapienza in cooperazione con il network ITN DiXiT (Digital Scholarly Editions Initial Training Network). AIUCD 2017 ha ospitato anche la terza edizione dell’EADH Day, tenutosi il 25 Gennaio 2017.
Gli abstract pubblicati in questo volume hanno ottenuto il parere favorevole da parte di valutatori esperti della materia, attraverso un processo di revisione anonima sotto la responsabilità del Comitato di Programma Internazionale di AIUCD 2017
AIUCD2017 - Book of Abstracts
Questo volume raccoglie gli abstract degli interventi presentati alla conferenza AIUCD 2017.
AIUCD 2017 si è svolta dal 26 al 28 Gennaio 2017 a Roma, ed è stata verrà organizzata dal Digilab,
Università Sapienza in cooperazione con il network ITN DiXiT (Digital Scholarly Editions Initial Training Network). AIUCD 2017 ha ospitato anche la terza edizione dell’EADH Day, tenutosi il 25 Gennaio 2017.
Gli abstract pubblicati in questo volume hanno ottenuto il parere favorevole da parte di valutatori esperti della materia, attraverso un processo di revisione anonima sotto la responsabilità del Comitato di Programma Internazionale di AIUCD 2017
Humanidades Digitales: Construcciones locales en contextos globales
Proceedings of the II International Conference of the Argentine Association of Digital Humanities/ Asocición Argentina de Humanidades Digitales (AAHD). "Humanidades Digitales: Construcciones locales en contextos globales". 47 articles in Spanish and Portuguese