Search CORE

1,301 research outputs found

Fast rates for noisy clustering

Author: Pack Kaelbling
Sébastien Loustau
Publication venue
Publication date: 01/01/2012
Field of study

The effect of errors in variables in empirical minimization is investigated. Given a loss

l

and a set of decision rules

\mathcal{G}

, we prove a general upper bound for an empirical minimization based on a deconvolution kernel and a noisy sample

Z_i=X_i+\epsilon_i,i=1,...,n

. We apply this general upper bound to give the rate of convergence for the expected excess risk in noisy clustering. A recent bound from \citet{levrard} proves that this rate is

\mathcal{O}(1/n)

in the direct case, under Pollard's regularity assumptions. Here the effect of noisy measurements gives a rate of the form

\mathcal{O}(1/n^{\frac{\gamma}{\gamma+2\beta}})

, where

\gamma

is the H\"older regularity of the density of

X

whereas

\beta

is the degree of illposedness

arXiv.org e-Print Archive

CiteSeerX

Elimination of All Bad Local Minima in Deep Learning

Author: Kaelbling Leslie Pack
Kawaguchi Kenji
Publication venue
Publication date: 15/01/2020
Field of study

In this paper, we theoretically prove that adding one special neuron per output unit eliminates all suboptimal local minima of any deep neural network, for multi-class classification, binary classification, and regression with an arbitrary loss function, under practical assumptions. At every local minimum of any deep neural network with these added neurons, the set of parameters of the original neural network (without added neurons) is guaranteed to be a global minimum of the original neural network. The effects of the added neurons are proven to automatically vanish at every local minimum. Moreover, we provide a novel theoretical characterization of a failure mode of eliminating suboptimal local minima via an additional theorem and several examples. This paper also introduces a novel proof technique based on the perturbable gradient basis (PGB) necessary condition of local minima, which provides new insight into the elimination of local minima and is applicable to analyze various models and transformations of objective functions beyond the elimination of local minima.Comment: Accepted to appear in AISTATS 202

arXiv.org e-Print Archive

DSpace@MIT

Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior

Author: Kaelbling Leslie Pack
Kim Beomjoon
Wang Zi
Publication venue
Publication date: 23/11/2018
Field of study

Bayesian optimization usually assumes that a Bayesian prior is given. However, the strong theoretical guarantees in Bayesian optimization are often regrettably compromised in practice because of unknown parameters in the prior. In this paper, we adopt a variant of empirical Bayes and show that, by estimating the Gaussian process prior from offline data sampled from the same prior and constructing unbiased estimators of the posterior, variants of both GP-UCB and probability of improvement achieve a near-zero regret bound, which decreases to a constant proportional to the observational noise as the number of offline data and the number of online evaluations increase. Empirically, we have verified our approach on challenging simulated robotic problems featuring task and motion planning.Comment: Proceedings of the Thirty-second Conference on Neural Information Processing Systems, 201

arXiv.org e-Print Archive

DSpace@MIT

Every Local Minimum Value is the Global Minimum Value of Induced Model in Non-convex Machine Learning

Author: Huang Jiaoyang
Kaelbling Leslie Pack
Kawaguchi Kenji
Publication venue: 'MIT Press - Journals'
Publication date: 15/11/2019
Field of study

For nonconvex optimization in machine learning, this article proves that every local minimum achieves the globally optimal value of the perturbable gradient basis model at any differentiable point. As a result, nonconvex machine learning is theoretically as supported as convex machine learning with a handcrafted basis in terms of the loss at differentiable local minima, except in the case when a preference is given to the handcrafted basis over the perturbable gradient basis. The proofs of these results are derived under mild assumptions. Accordingly, the proven results are directly applicable to many machine learning models, including practical deep neural networks, without any modification of practical methods. Furthermore, as special cases of our general results, this article improves or complements several state-of-the-art theoretical results on deep neural networks, deep residual networks, and overparameterized deep neural networks with a unified proof technique and novel geometric insights. A special case of our results also contributes to the theoretical foundation of representation learning.Comment: Neural computation, MIT pres

arXiv.org e-Print Archive

DSpace@MIT

Effect of Depth and Width on Local Minima in Deep Learning

Author: Huang Jiaoyang
Kaelbling Leslie Pack
Kawaguchi Kenji
Publication venue: 'MIT Press - Journals'
Publication date: 04/06/2019
Field of study

In this paper, we analyze the effects of depth and width on the quality of local minima, without strong over-parameterization and simplification assumptions in the literature. Without any simplification assumption, for deep nonlinear neural networks with the squared loss, we theoretically show that the quality of local minima tends to improve towards the global minimum value as depth and width increase. Furthermore, with a locally-induced structure on deep nonlinear neural networks, the values of local minima of neural networks are theoretically proven to be no worse than the globally optimal values of corresponding classical machine learning models. We empirically support our theoretical observation with a synthetic dataset as well as MNIST, CIFAR-10 and SVHN datasets. When compared to previous studies with strong over-parameterization assumptions, the results in this paper do not require over-parameterization, and instead show the gradual effects of over-parameterization as consequences of general results

arXiv.org e-Print Archive

DSpace@MIT

Generalization in Deep Learning

Author: Bengio Yoshua
Kaelbling Leslie Pack
Kawaguchi Kenji
Publication venue
Publication date: 27/07/2020
Field of study

This paper provides theoretical insights into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, nonrobustness, and sharp minima, responding to an open question in the literature. We also discuss approaches to provide non-vacuous generalization guarantees for deep learning. Based on theoretical observations, we propose new open problems and discuss the limitations of our results.Comment: To appear in Mathematics of Deep Learning, Cambridge University Press. All previous results remain unchange

arXiv.org e-Print Archive

ScholarBank@NUS

Integrating planning and reactive control

Author: Kaelbling Leslie Pack
Rosenschein Stanley J.
Publication venue
Publication date
Field of study

Artificial intelligence research on planning is concerned with designing control systems that choose actions by manipulating explicit descriptions of the world state, the goal to be achieved, and the effects of elementary operations available to the system. Because planning shifts much of the burden of reasoning to the machine, it holds great appeal as a high-level programming method. Experience shows, however, that it cannot be used indiscriminately because even moderately rich languages for describing goals, states, and the elementary operators lead to computational inefficiencies that render the approach unsuitable for realistic applications. This inadequacy has spawned a recent wave of research on reactive control or situated activity in which control systems are modeled as reacting directly to the current situation rather than as reasoning about the future effects of alternative action sequences. While this research has confronted the issue of run-time tractability head on, in many cases it has done so by sacrificing the advantages of declarative planning techniques. Ways in which the two approaches can be unified are discussed. The authors begin by modeling reactive control systems as state machines that map a stream of sensory inputs to a stream of control outputs. These machines can be decomposed into two continuously active subsystems: the planner and the execution module. The planner computes a plan, which can be seen as a set of bits that control the behavior of the execution module. An important element of this work is the formulation of a precise semantic interpretation for the inputs and outputs of the planning system. They show that the distinction between planned and reactive behavior is largely in the eye of the beholder: systems that seem to compute explicit plans can be redescribed in situation-action terms and vice versa. They also discuss practical programming techniques that allow the advantages of declarative programming and guaranteed reactive response to be achieved simultaneously

NASA Technical Reports Server