109,020 research outputs found
Generalization Properties of Learning with Random Features
We study the generalization properties of ridge regression with random features
in the statistical learning framework. We show for the first time that O(1/
1a
n)
learning bounds can be achieved with only O(
1a
n log n) random features rather
than O(n) as suggested by previous results. Further, we prove faster learning
rates and show that they might require more random features, unless they are
sampled according to a possibly problem dependent distribution. Our results
shed light on the statistical computational trade-offs in large scale kernelized
learning, showing the potential effectiveness of random features in reducing the
computational complexity while keeping optimal generalization propertie
Optimal Rates for Distributed Learning with Random Features
In recent studies, the generalization properties for distributed learning and
random features assumed the existence of the target concept over the hypothesis
space. However, this strict condition is not applicable to the more common
non-attainable case. In this paper, using refined proof techniques, we first
extend the optimal rates for distributed learning with random features to the
non-attainable case. Then, we reduce the number of required random features via
data-dependent generating strategy, and improve the allowed number of
partitions with additional unlabeled data. Theoretical analysis shows these
techniques remarkably reduce computational cost while preserving the optimal
generalization accuracy under standard assumptions. Finally, we conduct several
experiments on both simulated and real-world datasets, and the empirical
results validate our theoretical findings.Comment: Accpected at IJCAI 202
Generalization properties of finite size polynomial Support Vector Machines
The learning properties of finite size polynomial Support Vector Machines are
analyzed in the case of realizable classification tasks. The normalization of
the high order features acts as a squeezing factor, introducing a strong
anisotropy in the patterns distribution in feature space. As a function of the
training set size, the corresponding generalization error presents a crossover,
more or less abrupt depending on the distribution's anisotropy and on the task
to be learned, between a fast-decreasing and a slowly decreasing regime. This
behaviour corresponds to the stepwise decrease found by Dietrich et al.[Phys.
Rev. Lett. 82 (1999) 2975-2978] in the thermodynamic limit. The theoretical
results are in excellent agreement with the numerical simulations.Comment: 12 pages, 7 figure
Programmable Agents
We build deep RL agents that execute declarative programs expressed in formal
language. The agents learn to ground the terms in this language in their
environment, and can generalize their behavior at test time to execute new
programs that refer to objects that were not referenced during training. The
agents develop disentangled interpretable representations that allow them to
generalize to a wide variety of zero-shot semantic tasks
- …