Search CORE

42,223 research outputs found

Detecting Sockpuppets in Deceptive Opinion Spam

Author: Chih-Chung Chang
DH Fusilier
E Stamatatos
M Koppel
N Graham
T Qian
Vladimir N. Vapnik
Xinxing Xu
Publication venue
Publication date: 09/03/2017
Field of study

This paper explores the problem of sockpuppet detection in deceptive opinion spam using authorship attribution and verification approaches. Two methods are explored. The first is a feature subsampling scheme that uses the KL-Divergence on stylistic language models of an author to find discriminative features. The second is a transduction scheme, spy induction that leverages the diversity of authors in the unlabeled test set by sending a set of spies (positive samples) from the training set to retrieve hidden samples in the unlabeled test set using nearest and farthest neighbors. Experiments using ground truth sockpuppet data show the effectiveness of the proposed schemes.Comment: 18 pages, Accepted at CICLing 2017, 18th International Conference on Intelligent Text Processing and Computational Linguistic

arXiv.org e-Print Archive

Crossref

A Linear Classifier Based on Entity Recognition Tools and a Statistical Approach to Method Extraction in the Protein-Protein Interaction Literature

Author: Conover Michael
Lourenço Anália
Nematzadeh Azadeh
Pan Fengxia
Rocha Luis M.
Shatkay Hagit
Wong Andrew
Publication venue
Publication date: 01/01/2011
Field of study

We participated, in the Article Classification and the Interaction Method subtasks (ACT and IMT, respectively) of the Protein-Protein Interaction task of the BioCreative III Challenge. For the ACT, we pursued an extensive testing of available Named Entity Recognition and dictionary tools, and used the most promising ones to extend our Variable Trigonometric Threshold linear classifier. For the IMT, we experimented with a primarily statistical approach, as opposed to employing a deeper natural language processing strategy. Finally, we also studied the benefits of integrating the method extraction approach that we have used for the IMT into the ACT pipeline. For the ACT, our linear article classifier leads to a ranking and classification performance significantly higher than all the reported submissions. For the IMT, our results are comparable to those of other systems, which took very different approaches. For the ACT, we show that the use of named entity recognition tools leads to a substantial improvement in the ranking and classification of articles relevant to protein-protein interaction. Thus, we show that our substantially expanded linear classifier is a very competitive classifier in this domain. Moreover, this classifier produces interpretable surfaces that can be understood as "rules" for human understanding of the classification. In terms of the IMT task, in contrast to other participants, our approach focused on identifying sentences that are likely to bear evidence for the application of a PPI detection method, rather than on classifying a document as relevant to a method. As BioCreative III did not perform an evaluation of the evidence provided by the system, we have conducted a separate assessment; the evaluators agree that our tool is indeed effective in detecting relevant evidence for PPI detection methods.Comment: BMC Bioinformatics. In Pres

arXiv.org e-Print Archive

Universidade do Minho: RepositoriUM

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Hybrid model using logit and nonparametric methods for predicting micro-entity failure

Author: Blanco Oliver Antonio Jesús
Irimia Diéguez Ana Isabel
Oliver Alfonso María Dolores
Vázquez Cueto María José
Publication venue: LLC “Consulting Publishing Company “Business Perspectives”
Publication date: 01/01/2016
Field of study

Following the calls from literature on bankruptcy, a parsimonious hybrid bankruptcy model is developed in this paper by combining parametric and non-parametric approaches.To this end, the variables with the highest predictive power to detect bankruptcy are selected using logistic regression (LR). Subsequently, alternative non-parametric methods (Multilayer Perceptron, Rough Set, and Classification-Regression Trees) are applied, in turn, to firms classified as either “bankrupt” or “not bankrupt”. Our findings show that hybrid models, particularly those combining LR and Multilayer Perceptron, offer better accuracy performance and interpretability and converge faster than each method implemented in isolation. Moreover, the authors demonstrate that the introduction of non-financial and macroeconomic variables complement financial ratios for bankruptcy prediction

idUS. Depósito de Investigación Universidad de Sevilla

Remedy for Now but Prohibit for Tomorrow: The Deterrence Effects of Merger Policy Tools

Author: Barros Pedro Pita
Clougherty Joseph A.
Seldeslachts Jo
Publication venue
Publication date: 01/01/2007
Field of study

Antitrust policy involves not just the regulation of anti-competitive behavior, but also an important deterrence effect. Neither scholars nor policymakers have fully researched the deterrence effects of merger policy tools, as they have been unable to empirically measure these effects. We consider the ability of different antitrust actions – Prohibitions, Remedies, and Monitorings – to deter firms from engaging in mergers. We employ cross-jurisdiction/pan-time data on merger policy to empirically estimate the impact of antitrust actions on future merger frequencies. We find merger prohibitions to lead to decreased merger notifications in subsequent periods, and remedies to weakly increase future merger notifications: in other words, prohibitions involve a deterrence effect but remedies do not

CiteSeerX

Open Access LMU

Political Text Scaling Meets Computational Semantics

Author: Glavas Goran
Nanni Federico
Ponzetto Simone Paolo
Rehbein Ines
Stuckenschmidt Heiner
Publication venue
Publication date: 01/01/2021
Field of study

During the last fifteen years, automatic text scaling has become one of the key tools of the Text as Data community in political science. Prominent text scaling algorithms, however, rely on the assumption that latent positions can be captured just by leveraging the information about word frequencies in documents under study. We challenge this traditional view and present a new, semantically aware text scaling algorithm, SemScale, which combines recent developments in the area of computational linguistics with unsupervised graph-based clustering. We conduct an extensive quantitative analysis over a collection of speeches from the European Parliament in five different languages and from two different legislative terms, and show that a scaling approach relying on semantic document representations is often better at capturing known underlying political dimensions than the established frequency-based (i.e., symbolic) scaling method. We further validate our findings through a series of experiments focused on text preprocessing and feature selection, document representation, scaling of party manifestos, and a supervised extension of our algorithm. To catalyze further research on this new branch of text scaling methods, we release a Python implementation of SemScale with all included data sets and evaluation procedures.Comment: Updated version - accepted for Transactions on Data Science (TDS

arXiv.org e-Print Archive

MAnnheim DOCument Server