539 research outputs found
VEWS: A Wikipedia Vandal Early Warning System
We study the problem of detecting vandals on Wikipedia before any human or
known vandalism detection system reports flagging potential vandals so that
such users can be presented early to Wikipedia administrators. We leverage
multiple classical ML approaches, but develop 3 novel sets of features. Our
Wikipedia Vandal Behavior (WVB) approach uses a novel set of user editing
patterns as features to classify some users as vandals. Our Wikipedia
Transition Probability Matrix (WTPM) approach uses a set of features derived
from a transition probability matrix and then reduces it via a neural net
auto-encoder to classify some users as vandals. The VEWS approach merges the
previous two approaches. Without using any information (e.g. reverts) provided
by other users, these algorithms each have over 85% classification accuracy.
Moreover, when temporal recency is considered, accuracy goes to almost 90%. We
carry out detailed experiments on a new data set we have created consisting of
about 33K Wikipedia users (including both a black list and a white list of
editors) and containing 770K edits. We describe specific behaviors that
distinguish between vandals and non-vandals. We show that VEWS beats ClueBot NG
and STiki, the best known algorithms today for vandalism detection. Moreover,
VEWS detects far more vandals than ClueBot NG and on average, detects them 2.39
edits before ClueBot NG when both detect the vandal. However, we show that the
combination of VEWS and ClueBot NG can give a fully automated vandal early
warning system with even higher accuracy.Comment: To appear in Proceedings of the 21st ACM SIGKDD Conference of
Knowledge Discovery and Data Mining (KDD 2015
Abduction in Annotated Probabilistic Temporal Logic
Annotated Probabilistic Temporal (APT) logic programs are a form of logic programs that allow users to state (or systems to automatically learn)rules of the form ``formula G becomes true K time units after formula F became true with L to U% probability.\u27\u27
In this paper, we develop a theory of abduction for APT logic programs. Specifically, given an APT logic program Pi, a set of formulas H that can be ``added\u27\u27 to Pi, and a goal G, is there a subset S of H such that Pi cup S is consistent and entails the goal G? In this paper, we study the complexity of the Basic APT Abduction Problem (BAAP). We then leverage a geometric characterization of BAAP to suggest a set of pruning strategies when solving BAAP and use these intuitions to develop a sound and complete algorithm
Deception Detection in Videos
We present a system for covert automated deception detection in real-life
courtroom trial videos. We study the importance of different modalities like
vision, audio and text for this task. On the vision side, our system uses
classifiers trained on low level video features which predict human
micro-expressions. We show that predictions of high-level micro-expressions can
be used as features for deception prediction. Surprisingly, IDT (Improved Dense
Trajectory) features which have been widely used for action recognition, are
also very good at predicting deception in videos. We fuse the score of
classifiers trained on IDT features and high-level micro-expressions to improve
performance. MFCC (Mel-frequency Cepstral Coefficients) features from the audio
domain also provide a significant boost in performance, while information from
transcripts is not very beneficial for our system. Using various classifiers,
our automated system obtains an AUC of 0.877 (10-fold cross-validation) when
evaluated on subjects which were not part of the training set. Even though
state-of-the-art methods use human annotations of micro-expressions for
deception detection, our fully automated approach outperforms them by 5%. When
combined with human annotations of micro-expressions, our AUC improves to
0.922. We also present results of a user-study to analyze how well do average
humans perform on this task, what modalities they use for deception detection
and how they perform if only one modality is accessible. Our project page can
be found at \url{https://doubaibai.github.io/DARE/}.Comment: AAAI 2018, project page: https://doubaibai.github.io/DARE
Hybrid knowledge systems
An architecture called hybrid knowledge system (HKS) is described that can be used to interoperate between a specification of the control laws describing a physical system, a collection of databases, knowledge bases and/or other data structures reflecting information about the world in which the physical system controlled resides, observations (e.g. sensor information) from the external world, and actions that must be taken in response to external observations
Mitigating Adversarial Gray-Box Attacks Against Phishing Detectors
Although machine learning based algorithms have been extensively used for
detecting phishing websites, there has been relatively little work on how
adversaries may attack such "phishing detectors" (PDs for short). In this
paper, we propose a set of Gray-Box attacks on PDs that an adversary may use
which vary depending on the knowledge that he has about the PD. We show that
these attacks severely degrade the effectiveness of several existing PDs. We
then propose the concept of operation chains that iteratively map an original
set of features to a new set of features and develop the "Protective Operation
Chain" (POC for short) algorithm. POC leverages the combination of random
feature selection and feature mappings in order to increase the attacker's
uncertainty about the target PD. Using 3 existing publicly available datasets
plus a fourth that we have created and will release upon the publication of
this paper, we show that POC is more robust to these attacks than past
competing work, while preserving predictive performance when no adversarial
attacks are present. Moreover, POC is robust to attacks on 13 different
classifiers, not just one. These results are shown to be statistically
significant at the p < 0.001 level
Strong Completeness Results for Paraconsistent Logic Programming
In [6], we introduced a means of allowing logic programs to contain negations in both the head and the body of a clause. Such programs were called generally Horn programs (GHPs, for short). The model-theoretic semantics of GHPs were defined in terms of four-valued Belnap lattices [5]. For a class of programs called well-behaved programs, an SLD-resolution like proof procedure was introduced. This procedure was proven (under certain restrictions) to be sound (for existential queries) and complete (for ground queries). In this paper, we remove the restriction that programs be well-behaved and extend our soundness and completeness results to apply to arbitrary existential queries and to arbitrary GHPs. This is the strongest possible completeness result for GHPs. The results reported here apply to the design of very large knowledge bases and in processing queries to knowledge bases that possibly contain erroneous information
EC2:Ensemble Clustering and Classification for Predicting Android Malware Families
As the most widely used mobile platform, Android is also the biggest target for mobile malware. Given the increasing number of Android malware variants, detecting malware families is crucial so that security analysts can identify situations where signatures of a known malware family can be adapted as opposed to manually inspecting behavior of all samples. We present EC2 ( E nsemble C lustering and C lassification), a novel algorithm for discovering Android malware families of varying sizes —ranging from very large to very small families (even if previously unseen). We present a performance comparison of several traditional classification and clustering algorithms for Android malware family identification on DREBIN, the largest public Android malware dataset with labeled families. We use the output of both supervised classifiers and unsupervised clustering to design EC2 . Experimental results on both the DREBIN and the more recent Koodous malware datasets show that EC2 accurately detects both small and large families, outperforming several comparative baselines. Furthermore, we show how to automatically characterize and explain unique behaviors of specific malware families, such as FakeInstaller , MobileTx , Geinimi . In short, EC2 presents an early warning system for emerging new malware families, as well as a robust predictor of the family (when it is not new) to which a new malware sample belongs, and the design of novel strategies for data-driven understanding of malware behaviors
- …
