Search CORE

Fake View Analytics in Online Video Services

Author: Bolton R. J.
Cao Q.
Joachims T.
Vapnik V. N.
Publication venue
Publication date: 18/12/2013
Field of study

Online video-on-demand(VoD) services invariably maintain a view count for each video they serve, and it has become an important currency for various stakeholders, from viewers, to content owners, advertizers, and the online service providers themselves. There is often significant financial incentive to use a robot (or a botnet) to artificially create fake views. How can we detect the fake views? Can we detect them (and stop them) using online algorithms as they occur? What is the extent of fake views with current VoD service providers? These are the questions we study in the paper. We develop some algorithms and show that they are quite effective for this problem.Comment: 25 pages, 15 figure

Combinatorial probability and the tightness of generalization bounds

Author: A. A. Ivakhnenko
D. A. Kochedykov
G. S. Lbov
J. K. Martin
J. Quinlan
K. V. Vorontsov
K. V. Vorontsov
L. N. Bol’shev
M. Marchand
R. L. Rivest
V. N. Vapnik
V. N. Vapnik
V. N. Vapnik
V. Vapnik
W. W. Cohen
Yu. K. Belyaev
Publication venue: 'Pleiades Publishing Ltd'
Publication date
Field of study

A preliminary approach to the multilabel classification problem of Portuguese juridical documents

Author: A. McCallum
B. Schölkopf
C. Cortes
G. Salton
I. Witten
N. Cancedda
N. Cristianini
P. Quaresma
R. Quinlan
T. Joachims
V. Vapnik
V. Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Portuguese juridical documents from Supreme Courts and the Attorney General’s Office are manually classified by juridical experts into a set of classes belonging to a taxonomy of concepts. In this paper, a preliminary approach to develop techniques to automat- ically classify these juridical documents, is proposed. As basic strategy, the integration of natural language processing techniques with machine learning ones is used. Support Vector Machines (SVM) are used as learn- ing algorithm and the obtained results are presented and compared with other approaches, such as C4.5 and Naive Bayes

Repositório Científico da Universidade de Évora

Subsampling in Smoothed Range Spaces

Author: B Aronov
B Chazelle
D Haussler
J Beck
J Edmonds
J Matoušek
J Pach
N Alon
V Vapnik
Y Li
Publication venue
Publication date: 30/10/2015
Field of study

We consider smoothed versions of geometric range spaces, so an element of the ground set (e.g. a point) can be contained in a range with a non-binary value in

[0,1]

. Similar notions have been considered for kernels; we extend them to more general types of ranges. We then consider approximations of these range spaces through

\varepsilon

-nets and

\varepsilon

-samples (aka

\varepsilon

-approximations). We characterize when size bounds for

\varepsilon

-samples on kernels can be extended to these more general smoothed range spaces. We also describe new generalizations for

\varepsilon

-nets to these range spaces and show when results from binary range spaces can carry over to these smoothed ones.Comment: This is the full version of the paper which appeared in ALT 2015. 16 pages, 3 figures. In Algorithmic Learning Theory, pp. 224-238. Springer International Publishing, 201

Optimal estimation for Large-Eddy Simulation of turbulence and application to the analysis of subgrid models

Author: A. Moreau
Bishop C. M.
Deutsch R.
Friedt J.-M.
Haykin S.
J. P. Bertoglio
O. Teytaud
Vapnik V. N.
Publication venue: 'AIP Publishing'
Publication date: 06/06/2006
Field of study

The tools of optimal estimation are applied to the study of subgrid models for Large-Eddy Simulation of turbulence. The concept of optimal estimator is introduced and its properties are analyzed in the context of applications to a priori tests of subgrid models. Attention is focused on the Cook and Riley model in the case of a scalar field in isotropic turbulence. Using DNS data, the relevance of the beta assumption is estimated by computing (i) generalized optimal estimators and (ii) the error brought by this assumption alone. Optimal estimators are computed for the subgrid variance using various sets of variables and various techniques (histograms and neural networks). It is shown that optimal estimators allow a thorough exploration of models. Neural networks are proved to be relevant and very efficient in this framework, and further usages are suggested

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Clermont Université

Competing with stationary prediction strategies

Author: A. DeSantis
G. Gruenhage
G. Shafer
G.H. Hardy
J. Kivinen
J. Kivinen
J.F. Hannan
N. Cesa-Bianchi
N. Cesa-Bianchi
N. Littlestone
P. Auer
P. Billingsley
V. Vovk
V. Vovk
V. Vovk
V.N. Vapnik
W. Rudin
Y. Kalnishkan
Publication venue
Publication date: 13/07/2006
Field of study

In this paper we introduce the class of stationary prediction strategies and construct a prediction algorithm that asymptotically performs as well as the best continuous stationary strategy. We make mild compactness assumptions but no stochastic assumptions about the environment. In particular, no assumption of stationarity is made about the environment, and the stationarity of the considered strategies only means that they do not depend explicitly on time; we argue that it is natural to consider only stationary strategies even for highly non-stationary environments.Comment: 20 page

Royal Holloway Research Online

Royal Holloway - Pure

MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining

Author: Bartlett P. L.
De L.
Koltchinskii V.
Toivonen H.
Vapnik Vladimir N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for families of functions exhibiting poset (e.g., lattice) structure, such as those that arise in many pattern mining tasks. The MCERA allows us to compute upper bounds to the maximum deviation of sample means from their expectations, thus it can be used to find both statistically-significant functions (i.e., patterns) when the available data is seen as a sample from an unknown distribution, and approximations of collections of high-expectation functions (e.g., frequent patterns) when the available data is a small sample from a large dataset. This feature is a strong improvement over previously proposed solutions that could only achieve one of the two. MCRapper uses upper bounds to the discrepancy of the functions to efficiently explore and prune the search space, a technique borrowed from pattern mining itself. To show the practical use of MCRapper, we employ it to develop an algorithm TFP-R for the task of True Frequent Pattern (TFP) mining. TFP-R gives guarantees on the probability of including any false positives (precision) and exhibits higher statistical power (recall) than existing methods offering the same guarantees. We evaluate MCRapper and TFP-R and show that they outperform the state-of-the-art for their respective tasks

Archivio istituzionale della ricerca - Università di Padova

Learning from Minimum Entropy Queries in a Large Committee Machine

Author: E. Baum
G. J. Mitchison
H. S. Seung
H. Schwarze
J.-N. Hwang
M. Opper
P. Sollich
Peter Sollich
V. Vapnik
Y. Freund
Publication venue: 'American Physical Society (APS)'
Publication date: 11/04/1996
Field of study

In supervised learning, the redundancy contained in random examples can be avoided by learning from queries. Using statistical mechanics, we study learning from minimum entropy queries in a large tree-committee machine. The generalization error decreases exponentially with the number of training examples, providing a significant improvement over the algebraic decay for random examples. The connection between entropy and generalization error in multi-layer networks is discussed, and a computationally cheap algorithm for constructing queries is suggested and analysed.Comment: 4 pages, REVTeX, multicol, epsf, two postscript figures. To appear in Physical Review E (Rapid Communications