220 research outputs found
Elimination of All Bad Local Minima in Deep Learning
In this paper, we theoretically prove that adding one special neuron per
output unit eliminates all suboptimal local minima of any deep neural network,
for multi-class classification, binary classification, and regression with an
arbitrary loss function, under practical assumptions. At every local minimum of
any deep neural network with these added neurons, the set of parameters of the
original neural network (without added neurons) is guaranteed to be a global
minimum of the original neural network. The effects of the added neurons are
proven to automatically vanish at every local minimum. Moreover, we provide a
novel theoretical characterization of a failure mode of eliminating suboptimal
local minima via an additional theorem and several examples. This paper also
introduces a novel proof technique based on the perturbable gradient basis
(PGB) necessary condition of local minima, which provides new insight into the
elimination of local minima and is applicable to analyze various models and
transformations of objective functions beyond the elimination of local minima.Comment: Accepted to appear in AISTATS 202
Integrated robot task and motion planning in belief space
In this paper, we describe an integrated strategy for planning, perception, state-estimation and action in complex mobile manipulation domains. The strategy is based on planning in the belief space of probability distribution over states. Our planning approach is based on hierarchical goal regression (pre-image back-chaining). We develop a vocabulary of fluents that describe sets of belief states, which are goals and subgoals in the planning process. We show that a relatively small set of symbolic operators lead to task-oriented perception in support of the manipulation goals. An implementation of this method is demonstrated in simulation and on a real PR2 robot, showing robust, flexible solution of mobile manipulation problems with multiple objects and substantial uncertainty.This work was supported in part by the NSF under Grant No. 1117325. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We also gratefully acknowledge support from ONR MURI grant N00014-09-1-1051, from AFOSR grant AOARD-104135 and from the Singapore Ministry of Education under a grant to the Singapore-MIT International Design Center. We thank Willow Garage for the use of the PR2 robot as part of the PR2 Beta Program
Integrating planning and reactive control
Artificial intelligence research on planning is concerned with designing control systems that choose actions by manipulating explicit descriptions of the world state, the goal to be achieved, and the effects of elementary operations available to the system. Because planning shifts much of the burden of reasoning to the machine, it holds great appeal as a high-level programming method. Experience shows, however, that it cannot be used indiscriminately because even moderately rich languages for describing goals, states, and the elementary operators lead to computational inefficiencies that render the approach unsuitable for realistic applications. This inadequacy has spawned a recent wave of research on reactive control or situated activity in which control systems are modeled as reacting directly to the current situation rather than as reasoning about the future effects of alternative action sequences. While this research has confronted the issue of run-time tractability head on, in many cases it has done so by sacrificing the advantages of declarative planning techniques. Ways in which the two approaches can be unified are discussed. The authors begin by modeling reactive control systems as state machines that map a stream of sensory inputs to a stream of control outputs. These machines can be decomposed into two continuously active subsystems: the planner and the execution module. The planner computes a plan, which can be seen as a set of bits that control the behavior of the execution module. An important element of this work is the formulation of a precise semantic interpretation for the inputs and outputs of the planning system. They show that the distinction between planned and reactive behavior is largely in the eye of the beholder: systems that seem to compute explicit plans can be redescribed in situation-action terms and vice versa. They also discuss practical programming techniques that allow the advantages of declarative programming and guaranteed reactive response to be achieved simultaneously
Integrated Robot Task and Motion Planning in the Now
This paper provides an approach to integrating geometric motion planning with logical task planning for long-horizon tasks in domains with many objects. We propose a tight integration between the logical and geometric aspects of planning. We use a logical representation which includes entities that refer to poses, grasps, paths and regions, without the need for a priori discretization. Given this representation and some simple mechanisms for geometric inference, we characterize the pre-conditions and effects of robot actions in terms of these logical entities. We then reason about the interaction of the geometric and non-geometric aspects of our domains using the general-purpose mechanism of goal regression (also known as pre-image backchaining). We propose an aggressive mechanism for temporal hierarchical decomposition, which postpones the pre-conditions of actions to create an abstraction hierarchy that both limits the lengths of plans that need to be generated and limits the set of objects relevant to each plan. We describe an implementation of this planning method and demonstrate it in a simulated kitchen environment in which it solves problems that require approximately 100 individual pick or place operations for moving multiple objects in a complex domain.This work was supported in part by the NSF under Grant No. 1117325. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We also gratefully acknowledge support from ONR MURI grant N00014-09-1-1051, from AFOSR grant AOARD-104135 and from
the Singapore Ministry of Education under a grant to the Singapore-MIT International Design Center. We thank Willow Garage for the use of the PR2 robot as part of the PR2 Beta Program
Every Local Minimum Value is the Global Minimum Value of Induced Model in Non-convex Machine Learning
For nonconvex optimization in machine learning, this article proves that
every local minimum achieves the globally optimal value of the perturbable
gradient basis model at any differentiable point. As a result, nonconvex
machine learning is theoretically as supported as convex machine learning with
a handcrafted basis in terms of the loss at differentiable local minima, except
in the case when a preference is given to the handcrafted basis over the
perturbable gradient basis. The proofs of these results are derived under mild
assumptions. Accordingly, the proven results are directly applicable to many
machine learning models, including practical deep neural networks, without any
modification of practical methods. Furthermore, as special cases of our general
results, this article improves or complements several state-of-the-art
theoretical results on deep neural networks, deep residual networks, and
overparameterized deep neural networks with a unified proof technique and novel
geometric insights. A special case of our results also contributes to the
theoretical foundation of representation learning.Comment: Neural computation, MIT pres
Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior
Bayesian optimization usually assumes that a Bayesian prior is given.
However, the strong theoretical guarantees in Bayesian optimization are often
regrettably compromised in practice because of unknown parameters in the prior.
In this paper, we adopt a variant of empirical Bayes and show that, by
estimating the Gaussian process prior from offline data sampled from the same
prior and constructing unbiased estimators of the posterior, variants of both
GP-UCB and probability of improvement achieve a near-zero regret bound, which
decreases to a constant proportional to the observational noise as the number
of offline data and the number of online evaluations increase. Empirically, we
have verified our approach on challenging simulated robotic problems featuring
task and motion planning.Comment: Proceedings of the Thirty-second Conference on Neural Information
Processing Systems, 201
Effect of Depth and Width on Local Minima in Deep Learning
In this paper, we analyze the effects of depth and width on the quality of
local minima, without strong over-parameterization and simplification
assumptions in the literature. Without any simplification assumption, for deep
nonlinear neural networks with the squared loss, we theoretically show that the
quality of local minima tends to improve towards the global minimum value as
depth and width increase. Furthermore, with a locally-induced structure on deep
nonlinear neural networks, the values of local minima of neural networks are
theoretically proven to be no worse than the globally optimal values of
corresponding classical machine learning models. We empirically support our
theoretical observation with a synthetic dataset as well as MNIST, CIFAR-10 and
SVHN datasets. When compared to previous studies with strong
over-parameterization assumptions, the results in this paper do not require
over-parameterization, and instead show the gradual effects of
over-parameterization as consequences of general results
Generalization in Deep Learning
This paper provides theoretical insights into why and how deep learning can
generalize well, despite its large capacity, complexity, possible algorithmic
instability, nonrobustness, and sharp minima, responding to an open question in
the literature. We also discuss approaches to provide non-vacuous
generalization guarantees for deep learning. Based on theoretical observations,
we propose new open problems and discuss the limitations of our results.Comment: To appear in Mathematics of Deep Learning, Cambridge University
Press. All previous results remain unchange
- …