11 research outputs found

    Style Obfuscation by Invariance

    Full text link
    The task of obfuscating writing style using sequence models has previously been investigated under the framework of obfuscation-by-transfer, where the input text is explicitly rewritten in another style. These approaches also often lead to major alterations to the semantic content of the input. In this work, we propose obfuscation-by-invariance, and investigate to what extent models trained to be explicitly style-invariant preserve semantics. We evaluate our architectures on parallel and non-parallel corpora, and compare automatic and human evaluations on the obfuscated sentences. Our experiments show that style classifier performance can be reduced to chance level, whilst the automatic evaluation of the output is seemingly equal to models applying style-transfer. However, based on human evaluation we demonstrate a trade-off between the level of obfuscation and the observed quality of the output in terms of meaning preservation and grammaticality.Comment: Accepted for presentation at COLING1

    Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling

    Full text link
    Written language contains stylistic cues that can be exploited to automatically infer a variety of potentially sensitive author information. Adversarial stylometry intends to attack such models by rewriting an author's text. Our research proposes several components to facilitate deployment of these adversarial attacks in the wild, where neither data nor target models are accessible. We introduce a transformer-based extension of a lexical replacement attack, and show it achieves high transferability when trained on a weakly labeled corpus -- decreasing target model performance below chance. While not completely inconspicuous, our more successful attacks also prove notably less detectable by humans. Our framework therefore provides a promising direction for future privacy-preserving adversarial attacks.Comment: Accepted to EACL 202

    Authorship Attribution Through Words Surrounding Named Entities

    Get PDF
    In text analysis, authorship attribution occurs in a variety of ways. The field of computational linguistics becomes more important as the need of authorship attribution and text analysis becomes more widespread. For this research, pre-existing authorship attribution software, Java Graphical Authorship Attribution Program (JGAAP), implements a named entity recognizer, specifically the Stanford Named Entity Recognizer, to probe into similar genre text and to aid in extricating the correct author. This research specifically examines the words authors use around named entities in order to test the ability of these words at attributing authorshi

    Aplicación de estilometría para la atribución autorías en e-mails y documentos informáticos

    Get PDF
    Stylometry is the analysis by which authorship of a written text can be determined, analyzing special features that are unconsciously placed by a writer in his publications. In this integrative and research paper an application with the ability to extract various features of writing is presented. These features are compared against another profile, composed by features, in order to obtain a similarity percentage between two styles of composition that may be from the same or different authors. This has been achieved by incorporating several selected features that are considered relevant in order to perform an stylometric analysis, which include statistical observations about the pattern presented in the document, without setting aside the fact that it is applied to the Spanish language.La Estilometría es el análisis por el cual se puede determinar la autoría de un texto, que incluye el estudio de rasgos propios que utiliza un escritor al redactar documentos. En este trabajo de investigación e integración se presenta un programa con la capacidad de extraer diversos rasgos característicos de escritura, los mismos que son comparados contra otro tipo de redacción con la finalidad de obtener un porcentaje de similitud entre estilos diferentes de composición manejados por uno o varios autores específicos. Esto se ha logrado mediante incorporación de varios parámetros que son considerados relevantes en el momento de realizar un análisis, los que incluyen a observaciones estadísticas sobre componentes léxicos, sintácticos, semánticos y estructurales aplicados al español

    Analyzing Stylometric Approaches to Author Obfuscation

    No full text
    Part 2: FORENSIC TECHNIQUESInternational audienceAuthorship attribution is an important and emerging security tool. However, just as criminals may wear gloves to hide their fingerprints, so too may criminal authors mask their writing styles to escape detection. Most authorship studies have focused on cooperative and/or unaware authors who do not take such precautions. This paper analyzes the methods implemented in the Java Graphical Authorship Attribution Program (JGAAP) against essays in the Brennan-Greenstadt obfuscation corpus that were written in deliberate attempts to mask style. The results demonstrate that many of the more robust and accurate methods implemented in JGAAP are effective in the presence of active deception

    AIUCD2017 - Book of Abstracts

    Get PDF
    Questo volume raccoglie gli abstract degli interventi presentati alla conferenza AIUCD 2017. AIUCD 2017 si è svolta dal 26 al 28 Gennaio 2017 a Roma, ed è stata verrà organizzata dal Digilab, Università Sapienza in cooperazione con il network ITN DiXiT (Digital Scholarly Editions Initial Training Network). AIUCD 2017 ha ospitato anche la terza edizione dell’EADH Day, tenutosi il 25 Gennaio 2017. Gli abstract pubblicati in questo volume hanno ottenuto il parere favorevole da parte di valutatori esperti della materia, attraverso un processo di revisione anonima sotto la responsabilità del Comitato di Programma Internazionale di AIUCD 2017

    AIUCD2017 - Book of Abstracts

    Get PDF
    Questo volume raccoglie gli abstract degli interventi presentati alla conferenza AIUCD 2017. AIUCD 2017 si è svolta dal 26 al 28 Gennaio 2017 a Roma, ed è stata verrà organizzata dal Digilab, Università Sapienza in cooperazione con il network ITN DiXiT (Digital Scholarly Editions Initial Training Network). AIUCD 2017 ha ospitato anche la terza edizione dell’EADH Day, tenutosi il 25 Gennaio 2017. Gli abstract pubblicati in questo volume hanno ottenuto il parere favorevole da parte di valutatori esperti della materia, attraverso un processo di revisione anonima sotto la responsabilità del Comitato di Programma Internazionale di AIUCD 2017

    Humanidades Digitales: Construcciones locales en contextos globales

    Get PDF
    Proceedings of the II International Conference of the Argentine Association of Digital Humanities/ Asocición Argentina de Humanidades Digitales (AAHD). "Humanidades Digitales: Construcciones locales en contextos globales". 47 articles in Spanish and Portuguese
    corecore