6,514 research outputs found

    An agent-driven semantical identifier using radial basis neural networks and reinforcement learning

    Full text link
    Due to the huge availability of documents in digital form, and the deception possibility raise bound to the essence of digital documents and the way they are spread, the authorship attribution problem has constantly increased its relevance. Nowadays, authorship attribution,for both information retrieval and analysis, has gained great importance in the context of security, trust and copyright preservation. This work proposes an innovative multi-agent driven machine learning technique that has been developed for authorship attribution. By means of a preprocessing for word-grouping and time-period related analysis of the common lexicon, we determine a bias reference level for the recurrence frequency of the words within analysed texts, and then train a Radial Basis Neural Networks (RBPNN)-based classifier to identify the correct author. The main advantage of the proposed approach lies in the generality of the semantic analysis, which can be applied to different contexts and lexical domains, without requiring any modification. Moreover, the proposed system is able to incorporate an external input, meant to tune the classifier, and then self-adjust by means of continuous learning reinforcement.Comment: Published on: Proceedings of the XV Workshop "Dagli Oggetti agli Agenti" (WOA 2014), Catania, Italy, Sepember. 25-26, 201

    Detecting the Authors of Texts by Neural Network Committee Machines

    Get PDF
    This paper proposes a means of using a boosting by filtering algorithm in artificial neural networks to identify the author of a text. This approach involves filtering the training examples by different versions of a weak learning algorithm. It assures the availability of a large source of examples, with the examples being either discarded or kept during training. An advantage of this approach is that it allows for a small memory requirement. Once the network has been trained, its hidden layer activations are recorded as a representation of the selected lexical descriptors of an author. This stored information can then be used to identify the texts written by the same author. Texts studied are literary works of two Bosnian writers, Ivo Andrić  (1892-1975) and M. Meša Selimović (1910-1982). The data collected by counting syntactic characteristics in 1466 paragraphs of "na drini ćupria" by Ivo Andrić, and "derviš i smirt"  by M. Meša Selimović each

    Detecting the Authors of Texts by Neural Network Committee Machines

    Get PDF
    This paper proposes a means of using a boosting by filtering algorithm in artificial neural networks to identify the author of a text. This approach involves filtering the training examples by different versions of a weak learning algorithm. It assures the availability of a large source of examples, with the examples being either discarded or kept during training. An advantage of this approach is that it allows for a small memory requirement. Once the network has been trained, its hidden layer activations are recorded as a representation of the selected lexical descriptors of an author. This stored information can then be used to identify the texts written by the same author. Texts studied are literary works of two Bosnian writers, Ivo Andrić  (1892-1975) and M. Meša Selimović (1910-1982). The data collected by counting syntactic characteristics in 1466 paragraphs of "na drini ćupria" by Ivo Andrić, and "derviš i smirt"  by M. Meša Selimović each

    An effective and scalable framework for authorship attribution query processing

    Get PDF
    © 2018 The Authors. Published by IEEE. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://ieeexplore.ieee.org/document/8457490Authorship attribution aims at identifying the original author of an anonymous text from a given set of candidate authors and has a wide range of applications. The main challenge in authorship attribution problem is that the real-world applications tend to have hundreds of authors, while each author may have a small number of text samples, e.g., 5-10 texts/author. As a result, building a predictive model that can accurately identify the author of an anonymous text is a challenging task. In fact, existing authorship attribution solutions based on long text focus on application scenarios, where the number of candidate authors is limited to 50. These solutions generally report a significant performance reduction as the number of authors increases. To overcome this challenge, we propose a novel data representation model that captures stylistic variations within each document, which transforms the problem of authorship attribution into a similarity search problem. Based on this data representation model, we also propose a similarity query processing technique that can effectively handle outliers. We assess the accuracy of our proposed method against the state-of-the-art authorship attribution methods using real-world data sets extracted from Project Gutenberg. Our data set contains 3000 novels from 500 authors. Experimental results from this paper show that our method significantly outperforms all competitors. Specifically, as for the closed-set and open-set authorship attribution problems, our method have achieved higher than 95% accuracy.This work was supported by the CityU Project under Grant 7200387 and Grant 6000511.Published versio
    corecore