42,223 research outputs found
Detecting Sockpuppets in Deceptive Opinion Spam
This paper explores the problem of sockpuppet detection in deceptive opinion
spam using authorship attribution and verification approaches. Two methods are
explored. The first is a feature subsampling scheme that uses the KL-Divergence
on stylistic language models of an author to find discriminative features. The
second is a transduction scheme, spy induction that leverages the diversity of
authors in the unlabeled test set by sending a set of spies (positive samples)
from the training set to retrieve hidden samples in the unlabeled test set
using nearest and farthest neighbors. Experiments using ground truth sockpuppet
data show the effectiveness of the proposed schemes.Comment: 18 pages, Accepted at CICLing 2017, 18th International Conference on
Intelligent Text Processing and Computational Linguistic
A Linear Classifier Based on Entity Recognition Tools and a Statistical Approach to Method Extraction in the Protein-Protein Interaction Literature
We participated, in the Article Classification and the Interaction Method
subtasks (ACT and IMT, respectively) of the Protein-Protein Interaction task of
the BioCreative III Challenge. For the ACT, we pursued an extensive testing of
available Named Entity Recognition and dictionary tools, and used the most
promising ones to extend our Variable Trigonometric Threshold linear
classifier. For the IMT, we experimented with a primarily statistical approach,
as opposed to employing a deeper natural language processing strategy. Finally,
we also studied the benefits of integrating the method extraction approach that
we have used for the IMT into the ACT pipeline. For the ACT, our linear article
classifier leads to a ranking and classification performance significantly
higher than all the reported submissions. For the IMT, our results are
comparable to those of other systems, which took very different approaches. For
the ACT, we show that the use of named entity recognition tools leads to a
substantial improvement in the ranking and classification of articles relevant
to protein-protein interaction. Thus, we show that our substantially expanded
linear classifier is a very competitive classifier in this domain. Moreover,
this classifier produces interpretable surfaces that can be understood as
"rules" for human understanding of the classification. In terms of the IMT
task, in contrast to other participants, our approach focused on identifying
sentences that are likely to bear evidence for the application of a PPI
detection method, rather than on classifying a document as relevant to a
method. As BioCreative III did not perform an evaluation of the evidence
provided by the system, we have conducted a separate assessment; the evaluators
agree that our tool is indeed effective in detecting relevant evidence for PPI
detection methods.Comment: BMC Bioinformatics. In Pres
Hybrid model using logit and nonparametric methods for predicting micro-entity failure
Following the calls from literature on bankruptcy, a parsimonious hybrid bankruptcy model is developed in this paper
by combining parametric and non-parametric approaches.To this end, the variables with the highest predictive power to
detect bankruptcy are selected using logistic regression (LR). Subsequently, alternative non-parametric methods
(Multilayer Perceptron, Rough Set, and Classification-Regression Trees) are applied, in turn, to firms classified as
either “bankrupt” or “not bankrupt”. Our findings show that hybrid models, particularly those combining LR and
Multilayer Perceptron, offer better accuracy performance and interpretability and converge faster than each method
implemented in isolation. Moreover, the authors demonstrate that the introduction of non-financial and macroeconomic
variables complement financial ratios for bankruptcy prediction
Remedy for Now but Prohibit for Tomorrow: The Deterrence Effects of Merger Policy Tools
Antitrust policy involves not just the regulation of anti-competitive behavior, but also an important deterrence effect. Neither scholars nor policymakers have fully researched the deterrence effects of merger policy tools, as they have been unable to empirically measure these effects. We consider the ability of different antitrust actions – Prohibitions, Remedies, and Monitorings – to deter firms from engaging in mergers. We employ cross-jurisdiction/pan-time data on merger policy to empirically estimate the impact of antitrust actions on future merger frequencies. We find merger prohibitions to lead to decreased merger notifications in subsequent periods, and remedies to weakly increase future merger notifications: in other words, prohibitions involve a deterrence effect but remedies do not
Political Text Scaling Meets Computational Semantics
During the last fifteen years, automatic text scaling has become one of the
key tools of the Text as Data community in political science. Prominent text
scaling algorithms, however, rely on the assumption that latent positions can
be captured just by leveraging the information about word frequencies in
documents under study. We challenge this traditional view and present a new,
semantically aware text scaling algorithm, SemScale, which combines recent
developments in the area of computational linguistics with unsupervised
graph-based clustering. We conduct an extensive quantitative analysis over a
collection of speeches from the European Parliament in five different languages
and from two different legislative terms, and show that a scaling approach
relying on semantic document representations is often better at capturing known
underlying political dimensions than the established frequency-based (i.e.,
symbolic) scaling method. We further validate our findings through a series of
experiments focused on text preprocessing and feature selection, document
representation, scaling of party manifestos, and a supervised extension of our
algorithm. To catalyze further research on this new branch of text scaling
methods, we release a Python implementation of SemScale with all included data
sets and evaluation procedures.Comment: Updated version - accepted for Transactions on Data Science (TDS
- …