403,760 research outputs found

    CEAI: CCM based Email Authorship Identification Model

    Full text link
    In this paper we present a model for email authorship identification (EAI) by employing a Cluster-based Classification (CCM) technique. Traditionally, stylometric features have been successfully employed in various authorship analysis tasks; we extend the traditional feature-set to include some more interesting and effective features for email authorship identification (e.g. the last punctuation mark used in an email, the tendency of an author to use capitalization at the start of an email, or the punctuation after a greeting or farewell). We also included Info Gain feature selection based content features. It is observed that the use of such features in the authorship identification process has a positive impact on the accuracy of the authorship identification task. We performed experiments to justify our arguments and compared the results with other base line models. Experimental results reveal that the proposed CCM-based email authorship identification model, along with the proposed feature set, outperforms the state-of-the-art support vector machine (SVM)-based models, as well as the models proposed by Iqbal et al. [1, 2]. The proposed model attains an accuracy rate of 94% for 10 authors, 89% for 25 authors, and 81% for 50 authors, respectively on Enron dataset, while 89.5% accuracy has been achieved on authors' constructed real email dataset. The results on Enron dataset have been achieved on quite a large number of authors as compared to the models proposed by Iqbal et al. [1, 2]

    On the Feasibility of Malware Authorship Attribution

    Full text link
    There are many occasions in which the security community is interested to discover the authorship of malware binaries, either for digital forensics analysis of malware corpora or for thwarting live threats of malware invasion. Such a discovery of authorship might be possible due to stylistic features inherent to software codes written by human programmers. Existing studies of authorship attribution of general purpose software mainly focus on source code, which is typically based on the style of programs and environment. However, those features critically depend on the availability of the program source code, which is usually not the case when dealing with malware binaries. Such program binaries often do not retain many semantic or stylistic features due to the compilation process. Therefore, authorship attribution in the domain of malware binaries based on features and styles that will survive the compilation process is challenging. This paper provides the state of the art in this literature. Further, we analyze the features involved in those techniques. By using a case study, we identify features that can survive the compilation process. Finally, we analyze existing works on binary authorship attribution and study their applicability to real malware binaries.Comment: FPS 201

    The Propertisation of Science

    Get PDF
    For thirty years scientific institutions have been engaged in a process of propertisation through the strengthening of intellectual property in science. In fact, the relationship between science, intellectual property rights and the economic spheres have ever been neither stable nor continuous. Therefore a historical inquiry is necessary to understand the meaning and the practice of scientific property from the middle of 19th century to WW II. In this paper, the relationship between scientific authorship and property appears as a mean to promote the scientific work and its professionalization. Moreover, through the study of the French case, the place of science in the patent system is taken into account in order to understand, at last, the international controversy about scientific property during the interwar period.Propertisation ; Science ; Intellectual Property ; History ; Scientific Authorship

    Mapping the Evolution of "Clusters": A Meta-analysis

    Get PDF
    This paper presents a meta-analysis of the “cluster literature” contained in scientific journals from 1969 to 2007. Thanks to an original database we study the evolution of a stream of literature which focuses on a research object which is both a theoretical puzzle and an empirical widespread evidence. We identify different growth stages, from take-off to development and maturity. We test the existence of a life-cycle within the authorships and we discover the existence of a substitutability relation between different collaborative behaviours. We study the relationships between a “spatial” and an “industrial” approach within the textual corpus of cluster literature and we show the existence of a “predatory” interaction. We detect the relevance of clustering behaviours in the location of authors working on clusters and in measuring the influence of geographical distance in co-authorship. We measure the extent of a convergence process of the vocabulary of scientists working on clusters.Cluster, Life-Cycle, Cluster Literature, Textual Analysis, Agglomeration, Co-Authorship

    Studying the Emerging Global Brain: Analyzing and Visualizing the Impact of Co-Authorship Teams

    Full text link
    This paper introduces a suite of approaches and measures to study the impact of co-authorship teams based on the number of publications and their citations on a local and global scale. In particular, we present a novel weighted graph representation that encodes coupled author-paper networks as a weighted co-authorship graph. This weighted graph representation is applied to a dataset that captures the emergence of a new field of science and comprises 614 papers published by 1,036 unique authors between 1974 and 2004. In order to characterize the properties and evolution of this field we first use four different measures of centrality to identify the impact of authors. A global statistical analysis is performed to characterize the distribution of paper production and paper citations and its correlation with the co-authorship team size. The size of co-authorship clusters over time is examined. Finally, a novel local, author-centered measure based on entropy is applied to determine the global evolution of the field and the identification of the contribution of a single author's impact across all of its co-authorship relations. A visualization of the growth of the weighted co-author network and the results obtained from the statistical analysis indicate a drift towards a more cooperative, global collaboration process as the main drive in the production of scientific knowledge.Comment: 13 pages, 9 figure

    A story-in-the-making: an intertextual exploration of a multivoiced narrative

    Get PDF
    The following study will explore the stories which are not told – that is, it will scrutinize the process of intertextual emergence of an ultimately open story: one which has neither discernible authorship nor agenda and which remains in-the-making rather than strives to achieve closure. The paper will discuss the process in which multifaceted and multidirectional organizational stories are created, in which plots and characters exchange and ‘ending’ is defied. This lack of closure is perceived here as a breeding ground for networked meanings, which, if allowed to remain interdependent and plural, eschew the danger of a new organizational story becoming universal carrier of inflexibly established contents. Since the unifying semantic organizational frameworks (e.g. ‘success story’) may be construed as impostors attempting to ascribe both authorship and agency to a non- agentical and non-authored ‘untold story’, this study proposes one way in which multi- directedness and plurality of the story may be preserved

    Fighting Authorship Linkability with Crowdsourcing

    Full text link
    Massive amounts of contributed content -- including traditional literature, blogs, music, videos, reviews and tweets -- are available on the Internet today, with authors numbering in many millions. Textual information, such as product or service reviews, is an important and increasingly popular type of content that is being used as a foundation of many trendy community-based reviewing sites, such as TripAdvisor and Yelp. Some recent results have shown that, due partly to their specialized/topical nature, sets of reviews authored by the same person are readily linkable based on simple stylometric features. In practice, this means that individuals who author more than a few reviews under different accounts (whether within one site or across multiple sites) can be linked, which represents a significant loss of privacy. In this paper, we start by showing that the problem is actually worse than previously believed. We then explore ways to mitigate authorship linkability in community-based reviewing. We first attempt to harness the global power of crowdsourcing by engaging random strangers into the process of re-writing reviews. As our empirical results (obtained from Amazon Mechanical Turk) clearly demonstrate, crowdsourcing yields impressively sensible reviews that reflect sufficiently different stylometric characteristics such that prior stylometric linkability techniques become largely ineffective. We also consider using machine translation to automatically re-write reviews. Contrary to what was previously believed, our results show that translation decreases authorship linkability as the number of intermediate languages grows. Finally, we explore the combination of crowdsourcing and machine translation and report on the results
    corecore