Search CORE

539 research outputs found

VEWS: A Wikipedia Vandal Early Warning System

Author: Kumar Srijan
Spezzano Francesca
Subrahmanian V. S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/07/2015
Field of study

We study the problem of detecting vandals on Wikipedia before any human or known vandalism detection system reports flagging potential vandals so that such users can be presented early to Wikipedia administrators. We leverage multiple classical ML approaches, but develop 3 novel sets of features. Our Wikipedia Vandal Behavior (WVB) approach uses a novel set of user editing patterns as features to classify some users as vandals. Our Wikipedia Transition Probability Matrix (WTPM) approach uses a set of features derived from a transition probability matrix and then reduces it via a neural net auto-encoder to classify some users as vandals. The VEWS approach merges the previous two approaches. Without using any information (e.g. reverts) provided by other users, these algorithms each have over 85% classification accuracy. Moreover, when temporal recency is considered, accuracy goes to almost 90%. We carry out detailed experiments on a new data set we have created consisting of about 33K Wikipedia users (including both a black list and a white list of editors) and containing 770K edits. We describe specific behaviors that distinguish between vandals and non-vandals. We show that VEWS beats ClueBot NG and STiki, the best known algorithms today for vandalism detection. Moreover, VEWS detects far more vandals than ClueBot NG and on average, detects them 2.39 edits before ClueBot NG when both detect the vandal. However, we show that the combination of VEWS and ClueBot NG can give a fully automated vandal early warning system with even higher accuracy.Comment: To appear in Proceedings of the 21st ACM SIGKDD Conference of Knowledge Discovery and Data Mining (KDD 2015

arXiv.org e-Print Archive

Crossref

Abduction in Annotated Probabilistic Temporal Logic

Author: Molinaro Cristian
Sliva Amy
Subrahmanian V. S.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Technical Communications of the 27th International Conference on Logic Programming (ICLP\u2711)
Publication date: 01/01/2011
Field of study

Annotated Probabilistic Temporal (APT) logic programs are a form of logic programs that allow users to state (or systems to automatically learn)rules of the form ``formula G becomes true K time units after formula F became true with L to U% probability.\u27\u27 In this paper, we develop a theory of abduction for APT logic programs. Specifically, given an APT logic program Pi, a set of formulas H that can be ``added\u27\u27 to Pi, and a goal G, is there a subset S of H such that Pi cup S is consistent and entails the goal G? In this paper, we study the complexity of the Basic APT Abduction Problem (BAAP). We then leverage a geometric characterization of BAAP to suggest a set of pruning strategies when solving BAAP and use these intuitions to develop a sound and complete algorithm

DROPS Dagstuhl Research Online Publication Server

Deception Detection in Videos

Author: Davis Larry S.
Singh Bharat
Subrahmanian V. S.
Wu Zhe
Publication venue
Publication date: 12/12/2017
Field of study

We present a system for covert automated deception detection in real-life courtroom trial videos. We study the importance of different modalities like vision, audio and text for this task. On the vision side, our system uses classifiers trained on low level video features which predict human micro-expressions. We show that predictions of high-level micro-expressions can be used as features for deception prediction. Surprisingly, IDT (Improved Dense Trajectory) features which have been widely used for action recognition, are also very good at predicting deception in videos. We fuse the score of classifiers trained on IDT features and high-level micro-expressions to improve performance. MFCC (Mel-frequency Cepstral Coefficients) features from the audio domain also provide a significant boost in performance, while information from transcripts is not very beneficial for our system. Using various classifiers, our automated system obtains an AUC of 0.877 (10-fold cross-validation) when evaluated on subjects which were not part of the training set. Even though state-of-the-art methods use human annotations of micro-expressions for deception detection, our fully automated approach outperforms them by 5%. When combined with human annotations of micro-expressions, our AUC improves to 0.922. We also present results of a user-study to analyze how well do average humans perform on this task, what modalities they use for deception detection and how they perform if only one modality is accessible. Our project page can be found at \url{https://doubaibai.github.io/DARE/}.Comment: AAAI 2018, project page: https://doubaibai.github.io/DARE

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Hybrid knowledge systems

Author: Subrahmanian V. S.
Publication venue
Publication date
Field of study

An architecture called hybrid knowledge system (HKS) is described that can be used to interoperate between a specification of the control laws describing a physical system, a collection of databases, knowledge bases and/or other data structures reflecting information about the world in which the physical system controlled resides, observations (e.g. sensor information) from the external world, and actions that must be taken in response to external observations

NASA Technical Reports Server

Mitigating Adversarial Gray-Box Attacks Against Phishing Detectors

Author: Apruzzese Giovanni
Subrahmanian V. S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/12/2022
Field of study

Although machine learning based algorithms have been extensively used for detecting phishing websites, there has been relatively little work on how adversaries may attack such "phishing detectors" (PDs for short). In this paper, we propose a set of Gray-Box attacks on PDs that an adversary may use which vary depending on the knowledge that he has about the PD. We show that these attacks severely degrade the effectiveness of several existing PDs. We then propose the concept of operation chains that iteratively map an original set of features to a new set of features and develop the "Protective Operation Chain" (POC for short) algorithm. POC leverages the combination of random feature selection and feature mappings in order to increase the attacker's uncertainty about the target PD. Using 3 existing publicly available datasets plus a fourth that we have created and will release upon the publication of this paper, we show that POC is more robust to these attacks than past competing work, while preserving predictive performance when no adversarial attacks are present. Moreover, POC is robust to attacks on 13 different classifiers, not just one. These results are shown to be statistically significant at the p < 0.001 level

arXiv.org e-Print Archive

Strong Completeness Results for Paraconsistent Logic Programming

Author: Blair Howard A.
Subrahmanian V. S.
Publication venue: SURFACE at Syracuse University
Publication date: 01/05/1990
Field of study

In [6], we introduced a means of allowing logic programs to contain negations in both the head and the body of a clause. Such programs were called generally Horn programs (GHPs, for short). The model-theoretic semantics of GHPs were defined in terms of four-valued Belnap lattices [5]. For a class of programs called well-behaved programs, an SLD-resolution like proof procedure was introduced. This procedure was proven (under certain restrictions) to be sound (for existential queries) and complete (for ground queries). In this paper, we remove the restriction that programs be well-behaved and extend our soundness and completeness results to apply to arbitrary existential queries and to arbitrary GHPs. This is the strongest possible completeness result for GHPs. The results reported here apply to the design of very large knowledge bases and in processing queries to knowledge bases that possibly contain erroneous information

Syracuse University Research Facility and Collaborative Environment

EC2:Ensemble Clustering and Classification for Predicting Android Malware Families

Author: Chakraborty Tanmoy
Pierazzi Fabio
Subrahmanian V. S.
Publication venue
Publication date: 01/01/2020
Field of study

As the most widely used mobile platform, Android is also the biggest target for mobile malware. Given the increasing number of Android malware variants, detecting malware families is crucial so that security analysts can identify situations where signatures of a known malware family can be adapted as opposed to manually inspecting behavior of all samples. We present EC2 ( E nsemble C lustering and C lassification), a novel algorithm for discovering Android malware families of varying sizes —ranging from very large to very small families (even if previously unseen). We present a performance comparison of several traditional classification and clustering algorithms for Android malware family identification on DREBIN, the largest public Android malware dataset with labeled families. We use the output of both supervised classifiers and unsupervised clustering to design EC2 . Experimental results on both the DREBIN and the more recent Koodous malware datasets show that EC2 accurately detects both small and large families, outperforming several comparative baselines. Furthermore, we show how to automatically characterize and explain unique behaviors of specific malware families, such as FakeInstaller , MobileTx , Geinimi . In short, EC2 presents an early warning system for emerging new malware families, as well as a robust predictor of the family (when it is not new) to which a new malware sample belongs, and the design of novel strategies for data-driven understanding of malware behaviors

Crossref

King's Research Portal

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia