Search CORE

36,856 research outputs found

Detecting Sockpuppets in Deceptive Opinion Spam

Author: Chih-Chung Chang
DH Fusilier
E Stamatatos
M Koppel
N Graham
T Qian
Vladimir N. Vapnik
Xinxing Xu
Publication venue
Publication date: 09/03/2017
Field of study

This paper explores the problem of sockpuppet detection in deceptive opinion spam using authorship attribution and verification approaches. Two methods are explored. The first is a feature subsampling scheme that uses the KL-Divergence on stylistic language models of an author to find discriminative features. The second is a transduction scheme, spy induction that leverages the diversity of authors in the unlabeled test set by sending a set of spies (positive samples) from the training set to retrieve hidden samples in the unlabeled test set using nearest and farthest neighbors. Experiments using ground truth sockpuppet data show the effectiveness of the proposed schemes.Comment: 18 pages, Accepted at CICLing 2017, 18th International Conference on Intelligent Text Processing and Computational Linguistic

arXiv.org e-Print Archive

English Bards and Unknown Reviewers: a Stylometric Analysis of Thomas Moore and the Christabel Review

Author: Benatti Francesca
Tonra Justin
Publication venue
Publication date: 19/01/2015
Field of study

Fraught relations between authors and critics are a commonplace of literary history. The particular case that we discuss in this article, a negative review of Samuel Taylor Coleridge's Christabel (1816), has an additional point of interest beyond the usual mixture of amusement and resentment that surrounds a critical rebuke: the authorship of the review remains, to this day, uncertain. The purpose of this article is to investigate the possible candidacy of Thomas Moore as the author of the provocative review. It seeks to solve a puzzle of almost two hundred years, and in the process clear a valuable scholarly path in Irish Studies, Romanticism, and in our understanding of Moore's role in a prominent literary controversy of the age

Access to Research at National University of Ireland, Galway

Automatic Compositor Attribution in the First Folio of Shakespeare

Author: Alpert-Abrams Hannah
Berg-Kirkpatrick Taylor
Garrette Dan
Ryskina Maria
Publication venue
Publication date: 01/01/2017
Field of study

Compositor attribution, the clustering of pages in a historical printed document by the individual who set the type, is a bibliographic task that relies on analysis of orthographic variation and inspection of visual details of the printed page. In this paper, we introduce a novel unsupervised model that jointly describes the textual and visual features needed to distinguish compositors. Applied to images of Shakespeare's First Folio, our model predicts attributions that agree with the manual judgements of bibliographers with an accuracy of 87%, even on text that is the output of OCR.Comment: Short paper (6 pages) accepted at ACL 201

arXiv.org e-Print Archive

Payment in Credit: Copyright Law and Subcultural Creativity

Author: Tushnet Rebecca
Publication venue: Duke University School of Law
Publication date: 01/04/2007
Field of study

Copyright lawyers talk and write a lot about the uncertainties of fair use and the deterrent effects of a clearance culture on publishers, teachers, filmmakers, and the like, but know less about the choices people make about copyright on a daily basis, especially when they are not working. Here, Tushnet examines one subcultural group that engages in a variety of practices, from pure copying and distribution of others\u27 works to creation of new stories, art, and audiovisual works: the media-fan community. Among other things, she discusses some differences between fair use and fan practices, focused around attribution as an alternative to veto rights over uses of copyrighted works

More blogging features for author identification

Author: Ahmed Amr
Mohtasseb Haytham
Publication venue
Publication date: 01/01/2009
Field of study

In this paper we present a novel improvement in the field of authorship identification in personal blogs. The improvement in authorship identification, in our work, is by utilizing a hybrid collection of linguistic features that best capture the style of users in diaries blogs. The features sets contain LIWC with its psychology background, a collection of syntactic features & part-of-speech (POS), and the misspelling errors features. Furthermore, we analyze the contribution of each feature set on the final result and compare the outcome of using different combination from the selected feature sets. Our new categorization of misspelling words which are mapped into numerical features, are noticeably enhancing the classification results. The paper also confirms the best ranges of several parameters that affect the final result of authorship identification such as the author numbers, words number in each post, and the number of documents/posts for each author/user. The results and evaluation show that the utilized features are compact, while their performance is highly comparable with other much larger feature sets

CiteSeerX

Edge Hill University Research Information Repository