1,596 research outputs found
From average case complexity to improper learning complexity
The basic problem in the PAC model of computational learning theory is to
determine which hypothesis classes are efficiently learnable. There is
presently a dearth of results showing hardness of learning problems. Moreover,
the existing lower bounds fall short of the best known algorithms.
The biggest challenge in proving complexity results is to establish hardness
of {\em improper learning} (a.k.a. representation independent learning).The
difficulty in proving lower bounds for improper learning is that the standard
reductions from -hard problems do not seem to apply in this
context. There is essentially only one known approach to proving lower bounds
on improper learning. It was initiated in (Kearns and Valiant 89) and relies on
cryptographic assumptions.
We introduce a new technique for proving hardness of improper learning, based
on reductions from problems that are hard on average. We put forward a (fairly
strong) generalization of Feige's assumption (Feige 02) about the complexity of
refuting random constraint satisfaction problems. Combining this assumption
with our new technique yields far reaching implications. In particular,
1. Learning 's is hard.
2. Agnostically learning halfspaces with a constant approximation ratio is
hard.
3. Learning an intersection of halfspaces is hard.Comment: 34 page
Fake View Analytics in Online Video Services
Online video-on-demand(VoD) services invariably maintain a view count for
each video they serve, and it has become an important currency for various
stakeholders, from viewers, to content owners, advertizers, and the online
service providers themselves. There is often significant financial incentive to
use a robot (or a botnet) to artificially create fake views. How can we detect
the fake views? Can we detect them (and stop them) using online algorithms as
they occur? What is the extent of fake views with current VoD service
providers? These are the questions we study in the paper. We develop some
algorithms and show that they are quite effective for this problem.Comment: 25 pages, 15 figure
A preliminary approach to the multilabel classification problem of Portuguese juridical documents
Portuguese juridical documents from Supreme Courts and the Attorney General’s Office are manually classified by juridical experts into a set of classes belonging to a taxonomy of concepts. In this paper, a preliminary approach to develop techniques to automat- ically classify these juridical documents, is proposed. As basic strategy, the integration of natural language processing techniques with machine learning ones is used. Support Vector Machines (SVM) are used as learn- ing algorithm and the obtained results are presented and compared with other approaches, such as C4.5 and Naive Bayes
Subsampling in Smoothed Range Spaces
We consider smoothed versions of geometric range spaces, so an element of the
ground set (e.g. a point) can be contained in a range with a non-binary value
in . Similar notions have been considered for kernels; we extend them to
more general types of ranges. We then consider approximations of these range
spaces through -nets and -samples (aka
-approximations). We characterize when size bounds for
-samples on kernels can be extended to these more general
smoothed range spaces. We also describe new generalizations for -nets to these range spaces and show when results from binary range spaces can
carry over to these smoothed ones.Comment: This is the full version of the paper which appeared in ALT 2015. 16
pages, 3 figures. In Algorithmic Learning Theory, pp. 224-238. Springer
International Publishing, 201
Optimal estimation for Large-Eddy Simulation of turbulence and application to the analysis of subgrid models
The tools of optimal estimation are applied to the study of subgrid models
for Large-Eddy Simulation of turbulence. The concept of optimal estimator is
introduced and its properties are analyzed in the context of applications to a
priori tests of subgrid models. Attention is focused on the Cook and Riley
model in the case of a scalar field in isotropic turbulence. Using DNS data,
the relevance of the beta assumption is estimated by computing (i) generalized
optimal estimators and (ii) the error brought by this assumption alone. Optimal
estimators are computed for the subgrid variance using various sets of
variables and various techniques (histograms and neural networks). It is shown
that optimal estimators allow a thorough exploration of models. Neural networks
are proved to be relevant and very efficient in this framework, and further
usages are suggested
Competing with stationary prediction strategies
In this paper we introduce the class of stationary prediction strategies and
construct a prediction algorithm that asymptotically performs as well as the
best continuous stationary strategy. We make mild compactness assumptions but
no stochastic assumptions about the environment. In particular, no assumption
of stationarity is made about the environment, and the stationarity of the
considered strategies only means that they do not depend explicitly on time; we
argue that it is natural to consider only stationary strategies even for highly
non-stationary environments.Comment: 20 page
MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining
We present MCRapper, an algorithm for efficient computation of Monte-Carlo
Empirical Rademacher Averages (MCERA) for families of functions exhibiting
poset (e.g., lattice) structure, such as those that arise in many pattern
mining tasks. The MCERA allows us to compute upper bounds to the maximum
deviation of sample means from their expectations, thus it can be used to find
both statistically-significant functions (i.e., patterns) when the available
data is seen as a sample from an unknown distribution, and approximations of
collections of high-expectation functions (e.g., frequent patterns) when the
available data is a small sample from a large dataset. This feature is a strong
improvement over previously proposed solutions that could only achieve one of
the two. MCRapper uses upper bounds to the discrepancy of the functions to
efficiently explore and prune the search space, a technique borrowed from
pattern mining itself. To show the practical use of MCRapper, we employ it to
develop an algorithm TFP-R for the task of True Frequent Pattern (TFP) mining.
TFP-R gives guarantees on the probability of including any false positives
(precision) and exhibits higher statistical power (recall) than existing
methods offering the same guarantees. We evaluate MCRapper and TFP-R and show
that they outperform the state-of-the-art for their respective tasks
Learning from Minimum Entropy Queries in a Large Committee Machine
In supervised learning, the redundancy contained in random examples can be
avoided by learning from queries. Using statistical mechanics, we study
learning from minimum entropy queries in a large tree-committee machine. The
generalization error decreases exponentially with the number of training
examples, providing a significant improvement over the algebraic decay for
random examples. The connection between entropy and generalization error in
multi-layer networks is discussed, and a computationally cheap algorithm for
constructing queries is suggested and analysed.Comment: 4 pages, REVTeX, multicol, epsf, two postscript figures. To appear in
Physical Review E (Rapid Communications
Learning Kernel Perceptrons on Noisy Data and Random Projections
In this paper, we address the issue of learning nonlinearly separable concepts with a kernel classifier in the situation where the data at hand are altered by a uniform classification noise. Our proposed approach relies on the combination of the technique of random or deterministic projections with a classification noise tolerant perceptron learning algorithm that assumes distributions defined over finite-dimensional spaces. Provided a sufficient separation margin characterizes the problem, this strategy makes it possible to envision the learning from a noisy distribution in any separable Hilbert space, regardless of its dimension; learning with any appropriate Mercer kernel is therefore possible. We prove that the required sample complexity and running time of our algorithm is polynomial in the classical PAC learning parameters. Numerical simulations on toy datasets and on data from the UCI repository support the validity of our approach
- …