Search CORE

124 research outputs found

Detecting Sockpuppets in Deceptive Opinion Spam

Author: Chih-Chung Chang
DH Fusilier
E Stamatatos
M Koppel
N Graham
T Qian
Vladimir N. Vapnik
Xinxing Xu
Publication venue
Publication date: 09/03/2017
Field of study

This paper explores the problem of sockpuppet detection in deceptive opinion spam using authorship attribution and verification approaches. Two methods are explored. The first is a feature subsampling scheme that uses the KL-Divergence on stylistic language models of an author to find discriminative features. The second is a transduction scheme, spy induction that leverages the diversity of authors in the unlabeled test set by sending a set of spies (positive samples) from the training set to retrieve hidden samples in the unlabeled test set using nearest and farthest neighbors. Experiments using ground truth sockpuppet data show the effectiveness of the proposed schemes.Comment: 18 pages, Accepted at CICLing 2017, 18th International Conference on Intelligent Text Processing and Computational Linguistic

arXiv.org e-Print Archive

Crossref

Near-optimal Linear Decision Trees for k-SUM and Related Problems

Author: Cardinal Jean
Ezra Esther
Gold Omer
Goto Eiichi
Pettie Seth
Vapnik Vladimir N
Vapnik VN
Williams Virginia Vassilevska
Publication venue: eScholarship, University of California
Publication date: 04/05/2017
Field of study

We construct near-optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant k , we construct linear decision trees that solve the k -SUM problem on n elements using O ( n log 2 n ) linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two k -subsets; when viewed as linear queries, comparison queries are 2 k -sparse and have only { −1,0,1} coefficients. We give similar constructions for sorting sumsets A+B and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms. Our constructions are based on the notion of “inference dimension,” recently introduced by the authors in the context of active classification with comparison queries. This can be viewed as another contribution to the fruitful link between machine learning and discrete geometry, which goes back to the discovery of the VC dimension

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining

Author: Bartlett P. L.
De L.
Koltchinskii V.
Toivonen H.
Vapnik Vladimir N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for families of functions exhibiting poset (e.g., lattice) structure, such as those that arise in many pattern mining tasks. The MCERA allows us to compute upper bounds to the maximum deviation of sample means from their expectations, thus it can be used to find both statistically-significant functions (i.e., patterns) when the available data is seen as a sample from an unknown distribution, and approximations of collections of high-expectation functions (e.g., frequent patterns) when the available data is a small sample from a large dataset. This feature is a strong improvement over previously proposed solutions that could only achieve one of the two. MCRapper uses upper bounds to the discrepancy of the functions to efficiently explore and prune the search space, a technique borrowed from pattern mining itself. To show the practical use of MCRapper, we employ it to develop an algorithm TFP-R for the task of True Frequent Pattern (TFP) mining. TFP-R gives guarantees on the probability of including any false positives (precision) and exhibits higher statistical power (recall) than existing methods offering the same guarantees. We evaluate MCRapper and TFP-R and show that they outperform the state-of-the-art for their respective tasks

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Padova

Do Prices Coordinate Markets?

Author: Babaioff Moshe
Balcan Maria-Florina
Blum Avrim
Blum Avrim
Cesa-Bianchi Nicolò
Daniely Amit
Dhangwatnotai Peerapong
Elkind Edith
Littlestone Nick
Medina Andres Munoz
Morgenstern Jamie
Murota Kazuo
Oxley James G
Vapnik Vladimir N
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/06/2016
Field of study

Walrasian equilibrium prices can be said to coordinate markets: They support a welfare optimal allocation in which each buyer is buying bundle of goods that is individually most preferred. However, this clean story has two caveats. First, the prices alone are not sufficient to coordinate the market, and buyers may need to select among their most preferred bundles in a coordinated way to find a feasible allocation. Second, we don't in practice expect to encounter exact equilibrium prices tailored to the market, but instead only approximate prices, somehow encoding "distributional" information about the market. How well do prices work to coordinate markets when tie-breaking is not coordinated, and they encode only distributional information? We answer this question. First, we provide a genericity condition such that for buyers with Matroid Based Valuations, overdemand with respect to equilibrium prices is at most 1, independent of the supply of goods, even when tie-breaking is done in an uncoordinated fashion. Second, we provide learning-theoretic results that show that such prices are robust to changing the buyers in the market, so long as all buyers are sampled from the same (unknown) distribution

arXiv.org e-Print Archive

Crossref

On the Power of Democratic Networks

Author: E. N. Mayoraz
Muroga Saburo
Nabutovsky Dimitry
Vapnik Vladimir
Zuev Y. A.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref

Identifying Human Kinase-Specific Protein Phosphorylation Sites by Integrating Heterogeneous Information from Various Sources

Author: A Kreegipuu
AD Sharrocks
B Stillman
C-CCaC-J Lin
EJ Chang
F Diella
F Diella
F Gnad
G Manning
JA Ubersax
JH Kim
LA Pinna
LM Iakoucheva
M Wagner
MB Yaffe
N Blom
N Blom
Nanfang Xu
P Akamine
Pufeng Du
R Linding
S Yao
T Li
T Oelgeschlager
T Pawson
TH Dang
Tingting Li
V Andres
Vladimir N. Uversky
VN Vapnik
W Zachariae
Y Xue
Publication venue: Public Library of Science
Publication date: 15/11/2010
Field of study

Phosphorylation is an important type of protein post-translational modification. Identification of possible phosphorylation sites of a protein is important for understanding its functions. Unbiased screening for phosphorylation sites by in vitro or in vivo experiments is time consuming and expensive; in silico prediction can provide functional candidates and help narrow down the experimental efforts. Most of the existing prediction algorithms take only the polypeptide sequence around the phosphorylation sites into consideration. However, protein phosphorylation is a very complex biological process in vivo. The polypeptide sequences around the potential sites are not sufficient to determine the phosphorylation status of those residues. In the current work, we integrated various data sources such as protein functional domains, protein subcellular location and protein-protein interactions, along with the polypeptide sequences to predict protein phosphorylation sites. The heterogeneous information significantly boosted the prediction accuracy for some kinase families. To demonstrate potential application of our method, we scanned a set of human proteins and predicted putative phosphorylation sites for Cyclin-dependent kinases, Casein kinase 2, Glycogen synthase kinase 3, Mitogen-activated protein kinases, protein kinase A, and protein kinase C families (avaiable at http://cmbi.bjmu.edu.cn/huphospho). The predicted phosphorylation sites can serve as candidates for further experimental validation. Our strategy may also be applicable for the in silico identification of other post-translational modification substrates

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central