1,343 research outputs found

    Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

    Full text link
    A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum. Here we argue, based on results from statistical physics, random matrix theory, neural network theory, and empirical evidence, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Such saddle points are surrounded by high error plateaus that can dramatically slow down learning, and give the illusory impression of the existence of a local minimum. Motivated by these arguments, we propose a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. We apply this algorithm to deep or recurrent neural network training, and provide numerical evidence for its superior optimization performance.Comment: The theoretical review and analysis in this article draw heavily from arXiv:1405.4604 [cs.LG

    The differential geometric structure in supervised learning of classifiers

    Get PDF
    In this thesis, we study the overfitting problem in supervised learning of classifiers from a geometric perspective. As with many inverse problems, learning a classification function from a given set of example-label pairs is an ill-posed problem, i.e., there exist infinitely many classification functions that can correctly predict the class labels for all training examples. Among them, according to Occam's razor, simpler functions are favored since they are less overfitted to training examples and are therefore expected to perform better on unseen examples. The standard technique to enforce Occam's razor is to introduce a regularization scheme, which penalizes some type of complexity of the learned classification function. Some widely used regularization techniques are functional norm-based (Tikhonov) techniques, ensemble-based techniques, early stopping techniques, etc. However, there is important geometric information in the learned classification function that is closely related to overfitting, and has been overlooked by previous methods. In this thesis, we study the complexity of a classification function from a new geometric perspective. In particular, we investigate the differential geometric structure in the submanifold corresponding to the estimator of the class probability P(y|x), based on the observation that overfitting produces rapid local oscillations and hence large mean curvature of this submanifold. We also show that our geometric perspective of supervised learning is naturally related to an elastic model in physics, where our complexity measure is a high dimensional extension of the surface energy in physics. This study leads to a new geometric regularization approach for supervised learning of classifiers. In our approach, the learning process can be viewed as a submanifold fitting problem that is solved by a mean curvature flow method. In particular, our approach finds the submanifold by iteratively fitting the training examples in a curvature or volume decreasing manner. Our technique is unified for both binary and multiclass classification, and can be applied to regularize any classification function that satisfies two requirements: firstly, an estimator of the class probability can be obtained; secondly, first and second derivatives of the class probability estimator can be calculated. For applications, where we apply our regularization technique to standard loss functions for classification, our RBF-based implementation compares favorably to widely used regularization methods for both binary and multiclass classification. We also design a specific algorithm to incorporate our regularization technique into the standard forward-backward training of deep neural networks. For theoretical analysis, we establish Bayes consistency for a specific loss function under some mild initialization assumptions. We also discuss the extension of our approach to situations where the input space is a submanifold, rather than a Euclidean space.2018-11-30T00:00:00

    Delta rhythms as a substrate for holographic processing in sleep and wakefulness

    Get PDF
    PhD ThesisWe initially considered the theoretical properties and benefits of so-called holographic processing in a specific type of computational problem implied by the theories of synaptic rescaling processes in the biological wake-sleep cycle. This raised two fundamental questions that we attempted to answer by an experimental in vitro electrophysiological approach. We developed a comprehensive experimental paradigm based on a pharmacological model of the wake-sleep-associated delta rhythm measured with a Utah micro-electrode array at the interface between primary and associational areas in the rodent neocortex. We first verified that our in vitro delta rhythm model possessed two key features found in both in vivo rodent and human studies of synaptic rescaling processes in sleep: The first property being that prior local synaptic potentiation in wake leads to increased local delta power in subsequent sleep. The second property is the reactivation in sleep of neural firing patterns observed prior to sleep. By reproducing these findings we confirmed that our model is arguably an adequate medium for further study of the putative sleep-related synaptic rescaling process. In addition we found important differences between neural units that reactivated or deactivated during delta; these were differences in cell types based on unit spike shapes, in prior firing rates and in prior spike-train-to-local-field-potential coherence. Taken together these results suggested a mechanistic chain of explanation of the two observed properties, and set the neurobiological framework for further, more computationally driven analysis. Using the above experimental and theoretical substrate we developed a new method of analysis of micro-electrode array data. The method is a generalization to the electromagnetic case of a well-known technique for processing acoustic microphone array data. This allowed calculation of: The instantaneous spatial energy flow and dissipation in the neocortical areas under the array; The spatial energy source density in analogy to well-known current source density analysis. We then refocused our investigation on the two theoretical questions that we hoped to achieve experimental answers for: Whether the state of the neocortex during a delta rhythm could be described by ergodic statistics, which we determined by analyzing the spectral properties of energy dissipation as a signature of the state of the dynamical system; A more explorative approach prompting an investigation of the spatiotemporal interactions across and along neocortical layers and areas during a delta rhythm, as implied by energy flow patterns. We found that the in vitro rodent neocortex does not conform to ergodic statistics during a pharmacologically driven delta or gamma rhythm. We also found a delta period locked pattern of energy flow across and along layers and areas, which doubled the processing cycle relative to the fundamental delta rhythm, tentatively suggesting a reciprocal, two-stage information processing hierarchy similar to a stochastic Helmholtz machine with a wake-sleep training algorithm. Further, the complex valued energy flow might suggest an improvement to the Helmholtz machine concept by generalizing the complex valued weights of the stochastic network to higher dimensional multi-vectors of a geometric algebra with a metric particularity suited for holographic processes. Finally, preliminary attempts were made to implement and characterize the above network dynamics in silico. We found that a qubit valued network does not allow fully holographic processes, but tentatively suggest that an ebit valued network may display two key properties of general holographic processing

    State of the Art in Face Recognition

    Get PDF
    Notwithstanding the tremendous effort to solve the face recognition problem, it is not possible yet to design a face recognition system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Even new knowledge and perspectives from different fields like, psychology and neuroscience must be incorporated into the current field of face recognition to design a robust face recognition system. Indeed, many more efforts are required to end up with a human like face recognition system. This book tries to make an effort to reduce the gap between the previous face recognition research state and the future state

    Deep Reinforcement Learning for Inverse Kinematics and Path Following for Concentric Tube Robots

    Get PDF
    Concentric tube robots (CTRs) are continuum robots that allow for bending and twisting motions unattainable by traditional rigid link robots. The curvilinear backbones can benefit surgical applications by improving dexterity, enlarging the workspace, and reducing trauma at the entry point of the instrument. The curvilinear backbone that is attributed is the result of pre-curved, super-elastic tubes arranged concentrically. Each tube has a straight and pre-curved section and is actuated in rotation and translation from the tube base with the neighboring tube interactions producing the curvilinear backbone. The modeling of the neighboring tube interactions is non-trivial, and an explored topic in CTR literature. However, model-based kinematics and control can be inaccurate due to inherent manufacturing errors of the tubes, permanent deformation over time, and unmodelled physical interactions. This thesis proposes a model-free control method using deep reinforcement learning (DRL). The DRL framework aims to control the end-effector of the CTR with limited modeling information by leveraging simulation data, which is much less costly than hardware data. To develop a DRL framework, a Markov Decision Process (MDP) with states, actions, and rewards needs to be defined for the inverse kinematics task. First, action exploration was investigated with this MDP in a simpler simulation as CTRs have a unique extension degree of freedom per tube. Next, state representation, curriculum reward, and adaptation methods over multiple CTR systems were developed in a more accurate simulation. To validate the work in simulation, a noise-induced simulation environment was utilized to demonstrate the initial robustness of the learned policy. Finally, a hardware system was developed where a workspace characterization was performed to determine simulation to hardware differences. By using Sim2Real domain transfer, a simulation policy was successfully transferred to hardware for inverse kinematics and path following, validating the approach
    corecore