220 research outputs found

    Elimination of All Bad Local Minima in Deep Learning

    Full text link
    In this paper, we theoretically prove that adding one special neuron per output unit eliminates all suboptimal local minima of any deep neural network, for multi-class classification, binary classification, and regression with an arbitrary loss function, under practical assumptions. At every local minimum of any deep neural network with these added neurons, the set of parameters of the original neural network (without added neurons) is guaranteed to be a global minimum of the original neural network. The effects of the added neurons are proven to automatically vanish at every local minimum. Moreover, we provide a novel theoretical characterization of a failure mode of eliminating suboptimal local minima via an additional theorem and several examples. This paper also introduces a novel proof technique based on the perturbable gradient basis (PGB) necessary condition of local minima, which provides new insight into the elimination of local minima and is applicable to analyze various models and transformations of objective functions beyond the elimination of local minima.Comment: Accepted to appear in AISTATS 202

    Integrated robot task and motion planning in belief space

    Get PDF
    In this paper, we describe an integrated strategy for planning, perception, state-estimation and action in complex mobile manipulation domains. The strategy is based on planning in the belief space of probability distribution over states. Our planning approach is based on hierarchical goal regression (pre-image back-chaining). We develop a vocabulary of fluents that describe sets of belief states, which are goals and subgoals in the planning process. We show that a relatively small set of symbolic operators lead to task-oriented perception in support of the manipulation goals. An implementation of this method is demonstrated in simulation and on a real PR2 robot, showing robust, flexible solution of mobile manipulation problems with multiple objects and substantial uncertainty.This work was supported in part by the NSF under Grant No. 1117325. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We also gratefully acknowledge support from ONR MURI grant N00014-09-1-1051, from AFOSR grant AOARD-104135 and from the Singapore Ministry of Education under a grant to the Singapore-MIT International Design Center. We thank Willow Garage for the use of the PR2 robot as part of the PR2 Beta Program

    Integrating planning and reactive control

    Get PDF
    Artificial intelligence research on planning is concerned with designing control systems that choose actions by manipulating explicit descriptions of the world state, the goal to be achieved, and the effects of elementary operations available to the system. Because planning shifts much of the burden of reasoning to the machine, it holds great appeal as a high-level programming method. Experience shows, however, that it cannot be used indiscriminately because even moderately rich languages for describing goals, states, and the elementary operators lead to computational inefficiencies that render the approach unsuitable for realistic applications. This inadequacy has spawned a recent wave of research on reactive control or situated activity in which control systems are modeled as reacting directly to the current situation rather than as reasoning about the future effects of alternative action sequences. While this research has confronted the issue of run-time tractability head on, in many cases it has done so by sacrificing the advantages of declarative planning techniques. Ways in which the two approaches can be unified are discussed. The authors begin by modeling reactive control systems as state machines that map a stream of sensory inputs to a stream of control outputs. These machines can be decomposed into two continuously active subsystems: the planner and the execution module. The planner computes a plan, which can be seen as a set of bits that control the behavior of the execution module. An important element of this work is the formulation of a precise semantic interpretation for the inputs and outputs of the planning system. They show that the distinction between planned and reactive behavior is largely in the eye of the beholder: systems that seem to compute explicit plans can be redescribed in situation-action terms and vice versa. They also discuss practical programming techniques that allow the advantages of declarative programming and guaranteed reactive response to be achieved simultaneously

    Integrated Robot Task and Motion Planning in the Now

    Get PDF
    This paper provides an approach to integrating geometric motion planning with logical task planning for long-horizon tasks in domains with many objects. We propose a tight integration between the logical and geometric aspects of planning. We use a logical representation which includes entities that refer to poses, grasps, paths and regions, without the need for a priori discretization. Given this representation and some simple mechanisms for geometric inference, we characterize the pre-conditions and effects of robot actions in terms of these logical entities. We then reason about the interaction of the geometric and non-geometric aspects of our domains using the general-purpose mechanism of goal regression (also known as pre-image backchaining). We propose an aggressive mechanism for temporal hierarchical decomposition, which postpones the pre-conditions of actions to create an abstraction hierarchy that both limits the lengths of plans that need to be generated and limits the set of objects relevant to each plan. We describe an implementation of this planning method and demonstrate it in a simulated kitchen environment in which it solves problems that require approximately 100 individual pick or place operations for moving multiple objects in a complex domain.This work was supported in part by the NSF under Grant No. 1117325. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We also gratefully acknowledge support from ONR MURI grant N00014-09-1-1051, from AFOSR grant AOARD-104135 and from the Singapore Ministry of Education under a grant to the Singapore-MIT International Design Center. We thank Willow Garage for the use of the PR2 robot as part of the PR2 Beta Program

    Every Local Minimum Value is the Global Minimum Value of Induced Model in Non-convex Machine Learning

    Full text link
    For nonconvex optimization in machine learning, this article proves that every local minimum achieves the globally optimal value of the perturbable gradient basis model at any differentiable point. As a result, nonconvex machine learning is theoretically as supported as convex machine learning with a handcrafted basis in terms of the loss at differentiable local minima, except in the case when a preference is given to the handcrafted basis over the perturbable gradient basis. The proofs of these results are derived under mild assumptions. Accordingly, the proven results are directly applicable to many machine learning models, including practical deep neural networks, without any modification of practical methods. Furthermore, as special cases of our general results, this article improves or complements several state-of-the-art theoretical results on deep neural networks, deep residual networks, and overparameterized deep neural networks with a unified proof technique and novel geometric insights. A special case of our results also contributes to the theoretical foundation of representation learning.Comment: Neural computation, MIT pres

    Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior

    Full text link
    Bayesian optimization usually assumes that a Bayesian prior is given. However, the strong theoretical guarantees in Bayesian optimization are often regrettably compromised in practice because of unknown parameters in the prior. In this paper, we adopt a variant of empirical Bayes and show that, by estimating the Gaussian process prior from offline data sampled from the same prior and constructing unbiased estimators of the posterior, variants of both GP-UCB and probability of improvement achieve a near-zero regret bound, which decreases to a constant proportional to the observational noise as the number of offline data and the number of online evaluations increase. Empirically, we have verified our approach on challenging simulated robotic problems featuring task and motion planning.Comment: Proceedings of the Thirty-second Conference on Neural Information Processing Systems, 201

    Effect of Depth and Width on Local Minima in Deep Learning

    Full text link
    In this paper, we analyze the effects of depth and width on the quality of local minima, without strong over-parameterization and simplification assumptions in the literature. Without any simplification assumption, for deep nonlinear neural networks with the squared loss, we theoretically show that the quality of local minima tends to improve towards the global minimum value as depth and width increase. Furthermore, with a locally-induced structure on deep nonlinear neural networks, the values of local minima of neural networks are theoretically proven to be no worse than the globally optimal values of corresponding classical machine learning models. We empirically support our theoretical observation with a synthetic dataset as well as MNIST, CIFAR-10 and SVHN datasets. When compared to previous studies with strong over-parameterization assumptions, the results in this paper do not require over-parameterization, and instead show the gradual effects of over-parameterization as consequences of general results

    Generalization in Deep Learning

    Full text link
    This paper provides theoretical insights into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, nonrobustness, and sharp minima, responding to an open question in the literature. We also discuss approaches to provide non-vacuous generalization guarantees for deep learning. Based on theoretical observations, we propose new open problems and discuss the limitations of our results.Comment: To appear in Mathematics of Deep Learning, Cambridge University Press. All previous results remain unchange
    corecore