36,856 research outputs found
Detecting Sockpuppets in Deceptive Opinion Spam
This paper explores the problem of sockpuppet detection in deceptive opinion
spam using authorship attribution and verification approaches. Two methods are
explored. The first is a feature subsampling scheme that uses the KL-Divergence
on stylistic language models of an author to find discriminative features. The
second is a transduction scheme, spy induction that leverages the diversity of
authors in the unlabeled test set by sending a set of spies (positive samples)
from the training set to retrieve hidden samples in the unlabeled test set
using nearest and farthest neighbors. Experiments using ground truth sockpuppet
data show the effectiveness of the proposed schemes.Comment: 18 pages, Accepted at CICLing 2017, 18th International Conference on
Intelligent Text Processing and Computational Linguistic
English Bards and Unknown Reviewers: a Stylometric Analysis of Thomas Moore and the Christabel Review
Fraught relations between authors and critics are a commonplace of literary history. The particular case that we discuss in this article, a negative review of Samuel Taylor Coleridge's Christabel (1816), has an additional point of interest beyond the usual mixture of amusement and resentment that surrounds a critical rebuke: the authorship of the review remains, to this day, uncertain. The purpose of this article is to investigate the possible candidacy of Thomas Moore as the author of the provocative review. It seeks to solve a puzzle of almost two hundred years, and in the process clear a valuable scholarly path in Irish Studies, Romanticism, and in our understanding of Moore's role in a prominent literary controversy of the age
Automatic Compositor Attribution in the First Folio of Shakespeare
Compositor attribution, the clustering of pages in a historical printed
document by the individual who set the type, is a bibliographic task that
relies on analysis of orthographic variation and inspection of visual details
of the printed page. In this paper, we introduce a novel unsupervised model
that jointly describes the textual and visual features needed to distinguish
compositors. Applied to images of Shakespeare's First Folio, our model predicts
attributions that agree with the manual judgements of bibliographers with an
accuracy of 87%, even on text that is the output of OCR.Comment: Short paper (6 pages) accepted at ACL 201
Payment in Credit: Copyright Law and Subcultural Creativity
Copyright lawyers talk and write a lot about the uncertainties of fair use and the deterrent effects of a clearance culture on publishers, teachers, filmmakers, and the like, but know less about the choices people make about copyright on a daily basis, especially when they are not working. Here, Tushnet examines one subcultural group that engages in a variety of practices, from pure copying and distribution of others\u27 works to creation of new stories, art, and audiovisual works: the media-fan community. Among other things, she discusses some differences between fair use and fan practices, focused around attribution as an alternative to veto rights over uses of copyrighted works
More blogging features for author identification
In this paper we present a novel improvement in the field of authorship identification in personal blogs. The improvement in authorship identification, in our work, is by utilizing a hybrid collection of linguistic features that best capture the style of users in diaries blogs. The features sets contain LIWC with its psychology background, a collection of syntactic features & part-of-speech (POS), and the misspelling errors features.
Furthermore, we analyze the contribution of each feature set on the final result and compare the outcome of using different combination from the selected feature sets. Our new categorization of misspelling words which are mapped into numerical features, are noticeably enhancing the classification results. The paper also confirms the best ranges of several parameters that affect the final result of authorship identification such as the author numbers, words number in each post, and the number of documents/posts for each author/user. The results and evaluation show that the utilized features are compact, while their performance is highly comparable with other much larger feature sets
- ā¦