Search CORE

Fake View Analytics in Online Video Services

Author: Bolton R. J.
Cao Q.
Joachims T.
Vapnik V. N.
Publication venue
Publication date: 18/12/2013
Field of study

Online video-on-demand(VoD) services invariably maintain a view count for each video they serve, and it has become an important currency for various stakeholders, from viewers, to content owners, advertizers, and the online service providers themselves. There is often significant financial incentive to use a robot (or a botnet) to artificially create fake views. How can we detect the fake views? Can we detect them (and stop them) using online algorithms as they occur? What is the extent of fake views with current VoD service providers? These are the questions we study in the paper. We develop some algorithms and show that they are quite effective for this problem.Comment: 25 pages, 15 figure

Second-Generation Objects in the Universe: Radiative Cooling and Collapse of Halos with Virial Temperatures Above 10^4 Kelvin

Author: C Cortes
JC Platt
N Cristianini
VN Vapnik
VN Vapnik
Publication venue
Publication date: 01/01/2001
Field of study

The first generation of protogalaxies likely formed out of primordial gas via H2-cooling in cosmological minihalos with virial temperatures of a few 1000K. However, their abundance is likely to have been severely limited by feedback processes which suppressed H2 formation. The formation of the protogalaxies responsible for reionization and metal-enrichment of the intergalactic medium, then had to await the collapse of larger halos. Here we investigate the radiative cooling and collapse of gas in halos with virial temperatures Tvir > 10^4K. In these halos, efficient atomic line radiation allows rapid cooling of the gas to 8000 K; subsequently the gas can contract nearly isothermally at this temperature. Without an additional coolant, the gas would likely settle into a locally gravitationally stable disk; only disks with unusually low spin would be unstable. However, we find that the initial atomic line cooling leaves a large, out-of-equilibrium residual free electron fraction. This allows the molecular fraction to build up to a universal value of about x(H2) = 10^-3, almost independently of initial density and temperature. We show that this is a non--equilibrium freezeout value that can be understood in terms of timescale arguments. Furthermore, unlike in less massive halos, H2 formation is largely impervious to feedback from external UV fields, due to the high initial densities achieved by atomic cooling. The H2 molecules cool the gas further to about 100K, and allow the gas to fragment on scales of a few 100 Msun. We investigate the importance of various feedback effects such as H2-photodissociation from internal UV fields and radiation pressure due to Ly-alpha photon trapping, which are likely to regulate the efficiency of star formation.Comment: Revised version accepted by ApJ; some reorganization for clarit

University of East Anglia digital repository

Detecting Sockpuppets in Deceptive Opinion Spam

Author: Chih-Chung Chang
DH Fusilier
E Stamatatos
M Koppel
N Graham
T Qian
Vladimir N. Vapnik
Xinxing Xu
Publication venue
Publication date: 09/03/2017
Field of study

This paper explores the problem of sockpuppet detection in deceptive opinion spam using authorship attribution and verification approaches. Two methods are explored. The first is a feature subsampling scheme that uses the KL-Divergence on stylistic language models of an author to find discriminative features. The second is a transduction scheme, spy induction that leverages the diversity of authors in the unlabeled test set by sending a set of spies (positive samples) from the training set to retrieve hidden samples in the unlabeled test set using nearest and farthest neighbors. Experiments using ground truth sockpuppet data show the effectiveness of the proposed schemes.Comment: 18 pages, Accepted at CICLing 2017, 18th International Conference on Intelligent Text Processing and Computational Linguistic

Near-optimal Linear Decision Trees for k-SUM and Related Problems

Author: Cardinal Jean
Ezra Esther
Gold Omer
Goto Eiichi
Pettie Seth
Vapnik Vladimir N
Vapnik VN
Williams Virginia Vassilevska
Publication venue: eScholarship, University of California
Publication date: 01/06/2019
Field of study

We construct near-optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant k , we construct linear decision trees that solve the k -SUM problem on n elements using O ( n log 2 n ) linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two k -subsets; when viewed as linear queries, comparison queries are 2 k -sparse and have only { −1,0,1} coefficients. We give similar constructions for sorting sumsets A+B and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms. Our constructions are based on the notion of “inference dimension,” recently introduced by the authors in the context of active classification with comparison queries. This can be viewed as another contribution to the fruitful link between machine learning and discrete geometry, which goes back to the discovery of the VC dimension

eScholarship - University of California

Subsampling in Smoothed Range Spaces

Author: B Aronov
B Chazelle
D Haussler
J Beck
J Edmonds
J Matoušek
J Pach
N Alon
V Vapnik
Y Li
Publication venue
Publication date: 30/10/2015
Field of study

We consider smoothed versions of geometric range spaces, so an element of the ground set (e.g. a point) can be contained in a range with a non-binary value in

[0,1]

. Similar notions have been considered for kernels; we extend them to more general types of ranges. We then consider approximations of these range spaces through

\varepsilon

-nets and

\varepsilon

-samples (aka

\varepsilon

-approximations). We characterize when size bounds for

\varepsilon

-samples on kernels can be extended to these more general smoothed range spaces. We also describe new generalizations for

\varepsilon

-nets to these range spaces and show when results from binary range spaces can carry over to these smoothed ones.Comment: This is the full version of the paper which appeared in ALT 2015. 16 pages, 3 figures. In Algorithmic Learning Theory, pp. 224-238. Springer International Publishing, 201

Optimal estimation for Large-Eddy Simulation of turbulence and application to the analysis of subgrid models

Author: A. Moreau
Bishop C. M.
Deutsch R.
Friedt J.-M.
Haykin S.
J. P. Bertoglio
O. Teytaud
Vapnik V. N.
Publication venue: 'AIP Publishing'
Publication date: 06/06/2006
Field of study

The tools of optimal estimation are applied to the study of subgrid models for Large-Eddy Simulation of turbulence. The concept of optimal estimator is introduced and its properties are analyzed in the context of applications to a priori tests of subgrid models. Attention is focused on the Cook and Riley model in the case of a scalar field in isotropic turbulence. Using DNS data, the relevance of the beta assumption is estimated by computing (i) generalized optimal estimators and (ii) the error brought by this assumption alone. Optimal estimators are computed for the subgrid variance using various sets of variables and various techniques (histograms and neural networks). It is shown that optimal estimators allow a thorough exploration of models. Neural networks are proved to be relevant and very efficient in this framework, and further usages are suggested

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Clermont Université

A preliminary approach to the multilabel classification problem of Portuguese juridical documents

Author: A. McCallum
B. Schölkopf
C. Cortes
G. Salton
I. Witten
N. Cancedda
N. Cristianini
P. Quaresma
R. Quinlan
T. Joachims
V. Vapnik
V. Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Portuguese juridical documents from Supreme Courts and the Attorney General’s Office are manually classified by juridical experts into a set of classes belonging to a taxonomy of concepts. In this paper, a preliminary approach to develop techniques to automat- ically classify these juridical documents, is proposed. As basic strategy, the integration of natural language processing techniques with machine learning ones is used. Support Vector Machines (SVM) are used as learn- ing algorithm and the obtained results are presented and compared with other approaches, such as C4.5 and Naive Bayes

Repositório Científico da Universidade de Évora

Combinatorial probability and the tightness of generalization bounds

Author: A. A. Ivakhnenko
D. A. Kochedykov
G. S. Lbov
J. K. Martin
J. Quinlan
K. V. Vorontsov
K. V. Vorontsov
L. N. Bol’shev
M. Marchand
R. L. Rivest
V. N. Vapnik
V. N. Vapnik
V. N. Vapnik
V. Vapnik
W. W. Cohen
Yu. K. Belyaev
Publication venue: 'Pleiades Publishing Ltd'
Publication date
Field of study

Learning from Minimum Entropy Queries in a Large Committee Machine

Author: E. Baum
G. J. Mitchison
H. S. Seung
H. Schwarze
J.-N. Hwang
M. Opper
P. Sollich
Peter Sollich
V. Vapnik
Y. Freund
Publication venue: 'American Physical Society (APS)'
Publication date: 11/04/1996
Field of study

In supervised learning, the redundancy contained in random examples can be avoided by learning from queries. Using statistical mechanics, we study learning from minimum entropy queries in a large tree-committee machine. The generalization error decreases exponentially with the number of training examples, providing a significant improvement over the algebraic decay for random examples. The connection between entropy and generalization error in multi-layer networks is discussed, and a computationally cheap algorithm for constructing queries is suggested and analysed.Comment: 4 pages, REVTeX, multicol, epsf, two postscript figures. To appear in Physical Review E (Rapid Communications