403,760 research outputs found
CEAI: CCM based Email Authorship Identification Model
In this paper we present a model for email authorship identification (EAI) by
employing a Cluster-based Classification (CCM) technique. Traditionally,
stylometric features have been successfully employed in various authorship
analysis tasks; we extend the traditional feature-set to include some more
interesting and effective features for email authorship identification (e.g.
the last punctuation mark used in an email, the tendency of an author to use
capitalization at the start of an email, or the punctuation after a greeting or
farewell). We also included Info Gain feature selection based content features.
It is observed that the use of such features in the authorship identification
process has a positive impact on the accuracy of the authorship identification
task. We performed experiments to justify our arguments and compared the
results with other base line models. Experimental results reveal that the
proposed CCM-based email authorship identification model, along with the
proposed feature set, outperforms the state-of-the-art support vector machine
(SVM)-based models, as well as the models proposed by Iqbal et al. [1, 2]. The
proposed model attains an accuracy rate of 94% for 10 authors, 89% for 25
authors, and 81% for 50 authors, respectively on Enron dataset, while 89.5%
accuracy has been achieved on authors' constructed real email dataset. The
results on Enron dataset have been achieved on quite a large number of authors
as compared to the models proposed by Iqbal et al. [1, 2]
On the Feasibility of Malware Authorship Attribution
There are many occasions in which the security community is interested to
discover the authorship of malware binaries, either for digital forensics
analysis of malware corpora or for thwarting live threats of malware invasion.
Such a discovery of authorship might be possible due to stylistic features
inherent to software codes written by human programmers. Existing studies of
authorship attribution of general purpose software mainly focus on source code,
which is typically based on the style of programs and environment. However,
those features critically depend on the availability of the program source
code, which is usually not the case when dealing with malware binaries. Such
program binaries often do not retain many semantic or stylistic features due to
the compilation process. Therefore, authorship attribution in the domain of
malware binaries based on features and styles that will survive the compilation
process is challenging. This paper provides the state of the art in this
literature. Further, we analyze the features involved in those techniques. By
using a case study, we identify features that can survive the compilation
process. Finally, we analyze existing works on binary authorship attribution
and study their applicability to real malware binaries.Comment: FPS 201
The Propertisation of Science
For thirty years scientific institutions have been engaged in a process of propertisation through the strengthening of intellectual property in science. In fact, the relationship between science, intellectual property rights and the economic spheres have ever been neither stable nor continuous. Therefore a historical inquiry is necessary to understand the meaning and the practice of scientific property from the middle of 19th century to WW II. In this paper, the relationship between scientific authorship and property appears as a mean to promote the scientific work and its professionalization. Moreover, through the study of the French case, the place of science in the patent system is taken into account in order to understand, at last, the international controversy about scientific property during the interwar period.Propertisation ; Science ; Intellectual Property ; History ; Scientific Authorship
Mapping the Evolution of "Clusters": A Meta-analysis
This paper presents a meta-analysis of the “cluster literature” contained in scientific journals from 1969 to 2007. Thanks to an original database we study the evolution of a stream of literature which focuses on a research object which is both a theoretical puzzle and an empirical widespread evidence. We identify different growth stages, from take-off to development and maturity. We test the existence of a life-cycle within the authorships and we discover the existence of a substitutability relation between different collaborative behaviours. We study the relationships between a “spatial” and an “industrial” approach within the textual corpus of cluster literature and we show the existence of a “predatory” interaction. We detect the relevance of clustering behaviours in the location of authors working on clusters and in measuring the influence of geographical distance in co-authorship. We measure the extent of a convergence process of the vocabulary of scientists working on clusters.Cluster, Life-Cycle, Cluster Literature, Textual Analysis, Agglomeration, Co-Authorship
Studying the Emerging Global Brain: Analyzing and Visualizing the Impact of Co-Authorship Teams
This paper introduces a suite of approaches and measures to study the impact
of co-authorship teams based on the number of publications and their citations
on a local and global scale. In particular, we present a novel weighted graph
representation that encodes coupled author-paper networks as a weighted
co-authorship graph. This weighted graph representation is applied to a dataset
that captures the emergence of a new field of science and comprises 614 papers
published by 1,036 unique authors between 1974 and 2004. In order to
characterize the properties and evolution of this field we first use four
different measures of centrality to identify the impact of authors. A global
statistical analysis is performed to characterize the distribution of paper
production and paper citations and its correlation with the co-authorship team
size. The size of co-authorship clusters over time is examined. Finally, a
novel local, author-centered measure based on entropy is applied to determine
the global evolution of the field and the identification of the contribution of
a single author's impact across all of its co-authorship relations. A
visualization of the growth of the weighted co-author network and the results
obtained from the statistical analysis indicate a drift towards a more
cooperative, global collaboration process as the main drive in the production
of scientific knowledge.Comment: 13 pages, 9 figure
A story-in-the-making: an intertextual exploration of a multivoiced narrative
The following study will explore the stories which are not told – that is, it will scrutinize the process of intertextual emergence of an ultimately open story: one which has neither discernible authorship nor agenda and which remains in-the-making rather than strives to achieve closure. The paper will discuss the process in which multifaceted and multidirectional organizational stories are created, in which plots and characters exchange and ‘ending’ is defied. This lack of closure is perceived here as a breeding ground for networked meanings, which, if allowed to remain interdependent and plural, eschew the danger of a new organizational story becoming universal carrier of inflexibly established contents. Since the unifying semantic organizational frameworks (e.g. ‘success story’) may be construed as impostors attempting to ascribe both authorship and agency to a non- agentical and non-authored ‘untold story’, this study proposes one way in which multi- directedness and plurality of the story may be preserved
Fighting Authorship Linkability with Crowdsourcing
Massive amounts of contributed content -- including traditional literature,
blogs, music, videos, reviews and tweets -- are available on the Internet
today, with authors numbering in many millions. Textual information, such as
product or service reviews, is an important and increasingly popular type of
content that is being used as a foundation of many trendy community-based
reviewing sites, such as TripAdvisor and Yelp. Some recent results have shown
that, due partly to their specialized/topical nature, sets of reviews authored
by the same person are readily linkable based on simple stylometric features.
In practice, this means that individuals who author more than a few reviews
under different accounts (whether within one site or across multiple sites) can
be linked, which represents a significant loss of privacy.
In this paper, we start by showing that the problem is actually worse than
previously believed. We then explore ways to mitigate authorship linkability in
community-based reviewing. We first attempt to harness the global power of
crowdsourcing by engaging random strangers into the process of re-writing
reviews. As our empirical results (obtained from Amazon Mechanical Turk)
clearly demonstrate, crowdsourcing yields impressively sensible reviews that
reflect sufficiently different stylometric characteristics such that prior
stylometric linkability techniques become largely ineffective. We also consider
using machine translation to automatically re-write reviews. Contrary to what
was previously believed, our results show that translation decreases authorship
linkability as the number of intermediate languages grows. Finally, we explore
the combination of crowdsourcing and machine translation and report on the
results
- …
