1,301 research outputs found
Fast rates for noisy clustering
The effect of errors in variables in empirical minimization is investigated.
Given a loss and a set of decision rules , we prove a general
upper bound for an empirical minimization based on a deconvolution kernel and a
noisy sample . We apply this general upper bound
to give the rate of convergence for the expected excess risk in noisy
clustering. A recent bound from \citet{levrard} proves that this rate is
in the direct case, under Pollard's regularity assumptions.
Here the effect of noisy measurements gives a rate of the form
, where is the
H\"older regularity of the density of whereas is the degree of
illposedness
Elimination of All Bad Local Minima in Deep Learning
In this paper, we theoretically prove that adding one special neuron per
output unit eliminates all suboptimal local minima of any deep neural network,
for multi-class classification, binary classification, and regression with an
arbitrary loss function, under practical assumptions. At every local minimum of
any deep neural network with these added neurons, the set of parameters of the
original neural network (without added neurons) is guaranteed to be a global
minimum of the original neural network. The effects of the added neurons are
proven to automatically vanish at every local minimum. Moreover, we provide a
novel theoretical characterization of a failure mode of eliminating suboptimal
local minima via an additional theorem and several examples. This paper also
introduces a novel proof technique based on the perturbable gradient basis
(PGB) necessary condition of local minima, which provides new insight into the
elimination of local minima and is applicable to analyze various models and
transformations of objective functions beyond the elimination of local minima.Comment: Accepted to appear in AISTATS 202
Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior
Bayesian optimization usually assumes that a Bayesian prior is given.
However, the strong theoretical guarantees in Bayesian optimization are often
regrettably compromised in practice because of unknown parameters in the prior.
In this paper, we adopt a variant of empirical Bayes and show that, by
estimating the Gaussian process prior from offline data sampled from the same
prior and constructing unbiased estimators of the posterior, variants of both
GP-UCB and probability of improvement achieve a near-zero regret bound, which
decreases to a constant proportional to the observational noise as the number
of offline data and the number of online evaluations increase. Empirically, we
have verified our approach on challenging simulated robotic problems featuring
task and motion planning.Comment: Proceedings of the Thirty-second Conference on Neural Information
Processing Systems, 201
Every Local Minimum Value is the Global Minimum Value of Induced Model in Non-convex Machine Learning
For nonconvex optimization in machine learning, this article proves that
every local minimum achieves the globally optimal value of the perturbable
gradient basis model at any differentiable point. As a result, nonconvex
machine learning is theoretically as supported as convex machine learning with
a handcrafted basis in terms of the loss at differentiable local minima, except
in the case when a preference is given to the handcrafted basis over the
perturbable gradient basis. The proofs of these results are derived under mild
assumptions. Accordingly, the proven results are directly applicable to many
machine learning models, including practical deep neural networks, without any
modification of practical methods. Furthermore, as special cases of our general
results, this article improves or complements several state-of-the-art
theoretical results on deep neural networks, deep residual networks, and
overparameterized deep neural networks with a unified proof technique and novel
geometric insights. A special case of our results also contributes to the
theoretical foundation of representation learning.Comment: Neural computation, MIT pres
Effect of Depth and Width on Local Minima in Deep Learning
In this paper, we analyze the effects of depth and width on the quality of
local minima, without strong over-parameterization and simplification
assumptions in the literature. Without any simplification assumption, for deep
nonlinear neural networks with the squared loss, we theoretically show that the
quality of local minima tends to improve towards the global minimum value as
depth and width increase. Furthermore, with a locally-induced structure on deep
nonlinear neural networks, the values of local minima of neural networks are
theoretically proven to be no worse than the globally optimal values of
corresponding classical machine learning models. We empirically support our
theoretical observation with a synthetic dataset as well as MNIST, CIFAR-10 and
SVHN datasets. When compared to previous studies with strong
over-parameterization assumptions, the results in this paper do not require
over-parameterization, and instead show the gradual effects of
over-parameterization as consequences of general results
Generalization in Deep Learning
This paper provides theoretical insights into why and how deep learning can
generalize well, despite its large capacity, complexity, possible algorithmic
instability, nonrobustness, and sharp minima, responding to an open question in
the literature. We also discuss approaches to provide non-vacuous
generalization guarantees for deep learning. Based on theoretical observations,
we propose new open problems and discuss the limitations of our results.Comment: To appear in Mathematics of Deep Learning, Cambridge University
Press. All previous results remain unchange
Integrating planning and reactive control
Artificial intelligence research on planning is concerned with designing control systems that choose actions by manipulating explicit descriptions of the world state, the goal to be achieved, and the effects of elementary operations available to the system. Because planning shifts much of the burden of reasoning to the machine, it holds great appeal as a high-level programming method. Experience shows, however, that it cannot be used indiscriminately because even moderately rich languages for describing goals, states, and the elementary operators lead to computational inefficiencies that render the approach unsuitable for realistic applications. This inadequacy has spawned a recent wave of research on reactive control or situated activity in which control systems are modeled as reacting directly to the current situation rather than as reasoning about the future effects of alternative action sequences. While this research has confronted the issue of run-time tractability head on, in many cases it has done so by sacrificing the advantages of declarative planning techniques. Ways in which the two approaches can be unified are discussed. The authors begin by modeling reactive control systems as state machines that map a stream of sensory inputs to a stream of control outputs. These machines can be decomposed into two continuously active subsystems: the planner and the execution module. The planner computes a plan, which can be seen as a set of bits that control the behavior of the execution module. An important element of this work is the formulation of a precise semantic interpretation for the inputs and outputs of the planning system. They show that the distinction between planned and reactive behavior is largely in the eye of the beholder: systems that seem to compute explicit plans can be redescribed in situation-action terms and vice versa. They also discuss practical programming techniques that allow the advantages of declarative programming and guaranteed reactive response to be achieved simultaneously
- …