6 research outputs found

    SUDMAD: Sequential and unsupervised decomposition of a multi-author document based on a hidden markov model

    Full text link
    © 2017 ASIS & T. Decomposing a document written by more than one author into sentences based on authorship is of great significance due to the increasing demand for plagiarism detection, forensic analysis, civil law (i.e., disputed copyright issues), and intelligence issues that involve disputed anonymous documents. Among existing studies for document decomposition, some were limited by specific languages, according to topics or restricted to a document of two authors, and their accuracies have big room for improvement. In this paper, we consider the contextual correlation hidden among sentences and propose an algorithm for Sequential and Unsupervised Decomposition of a Multi-Author Document (SUDMAD) written in any language, disregarding topics, through the construction of a Hidden Markov Model (HMM) reflecting the authors’ writing styles. To build and learn such a model, an unsupervised, statistical approach is first proposed to estimate the initial values of HMM parameters of a preliminary model, which does not require the availability of any information of author’s or document’s context other than how many authors contributed to writing the document. To further boost the performance of this approach, a boosted HMM learning procedure is proposed next, where the initial classification results are used to create labeled training data to learn a more accurate HMM. Moreover, the contextual relationship among sentences is further utilized to refine the classification results. Our proposed approach is empirically evaluated on three benchmark datasets that are widely used for authorship analysis of documents. Comparisons with recent state-of-the-art approaches are also presented to demonstrate the significance of our new ideas and the superior performance of our approach

    RETOS DE LA ESTILÍSTICA FORENSE EN EL ÁMBITO DEL DISCURSO ELECTRÓNICO DELICTIVO

    Get PDF
    Despite its benefits, Internet provides an accessible, affordable and anonymous way for the dissemination of offensive contents or hate speeches. Among their object of study Forensic Linguistics includes the authorship attribution of this type of messages. This study looks into the key methodological aspects to be considered in Authorship Attribution. The selection of the more appropriate features, the text size and how to draw conclusions from data are among them. There is still a long way to solve some of the problems related to them in this scientific field.A pesar de sus beneficios, Internet proporciona una manera accesible, asequible y anónima para la difusión de contenidos ofensivos o discursos de odio. La Lingüística forense cuenta entre su objeto de estudio la atribución de autoría de este tipo de mensajes. Este estudio analiza los factores metodológicos clave que se tienen que considerar en el proceso de identificación de un posible autor. Entre ellos se destacan la selección de los rasgos más apropiados, el tamaño del texto y cómo extraer conclusiones a partir de los datos. Aún queda un largo recorrido en este campo científico para poder solucionar algunos de los problemas relacionados con esta metodología

    Drawing Elena Ferrante's Profile. Workshop Proceedings, Padova, 7 September 2017

    Get PDF
    Elena Ferrante is an internationally acclaimed Italian novelist whose real identity has been kept secret by E/O publishing house for more than 25 years. Owing to her popularity, major Italian and foreign newspapers have long tried to discover her real identity. However, only a few attempts have been made to foster a scientific debate on her work. In 2016, Arjuna Tuzzi and Michele Cortelazzo led an Italian research team that conducted a preliminary study and collected a well-founded, large corpus of Italian novels comprising 150 works published in the last 30 years by 40 different authors. Moreover, they shared their data with a select group of international experts on authorship attribution, profiling, and analysis of textual data: Maciej Eder and Jan Rybicki (Poland), Patrick Juola (United States), Vittorio Loreto and his research team, Margherita Lalli and Francesca Tria (Italy), George Mikros (Greece), Pierre Ratinaud (France), and Jacques Savoy (Switzerland). The chapters of this volume report the results of this endeavour that were first presented during the international workshop Drawing Elena Ferrante's Profile in Padua on 7 September 2017 as part of the 3rd IQLA-GIAT Summer School in Quantitative Analysis of Textual Data. The fascinating research findings suggest that Elena Ferrante\u2019s work definitely deserves \u201cmany hands\u201d as well as an extensive effort to understand her distinct writing style and the reasons for her worldwide success
    corecore