12 research outputs found

    Efficient approaches for escaping higher order saddle points in non-convex optimization

    Get PDF
    Local search heuristics for non-convex optimizations are popular in applied machine learning. However, in general it is hard to guarantee that such algorithms even converge to a local minimum, due to the existence of complicated saddle point structures in high dimensions. Many functions have degenerate saddle points such that the first and second order derivatives cannot distinguish them with local optima. In this paper we use higher order derivatives to escape these saddle points: we design the first efficient algorithm guaranteed to converge to a third order local optimum (while existing techniques are at most second order). We also show that it is NP-hard to extend this further to finding fourth order local optima

    On the Principle of Least Symmetry Breaking in Shallow ReLU Models

    Full text link
    We consider the optimization problem associated with fitting two-layer ReLU networks with respect to the squared loss, where labels are assumed to be generated by a target network. Focusing first on standard Gaussian inputs, we show that the structure of spurious local minima detected by stochastic gradient descent (SGD) is, in a well-defined sense, the \emph{least loss of symmetry} with respect to the target weights. A closer look at the analysis indicates that this principle of least symmetry breaking may apply to a broader range of settings. Motivated by this, we conduct a series of experiments which corroborate this hypothesis for different classes of non-isotropic non-product distributions, smooth activation functions and networks with a few layers

    Levenberg-Marquardt Algorithm for Mackey-Glass Chaotic Time Series Prediction

    Get PDF
    For decades, Mackey-Glass chaotic time series prediction has attracted more and more attention. When the multilayer perceptron is used to predict the Mackey-Glass chaotic time series, what we should do is to minimize the loss function. As is well known, the convergence speed of the loss function is rapid in the beginning of the learning process, while the convergence speed is very slow when the parameter is near to the minimum point. In order to overcome these problems, we introduce the Levenberg-Marquardt algorithm (LMA). Firstly, a rough introduction is given to the multilayer perceptron, including the structure and the model approximation method. Secondly, we introduce the LMA and discuss how to implement the LMA. Lastly, an illustrative example is carried out to show the prediction efficiency of the LMA. Simulations show that the LMA can give more accurate prediction than the gradient descent method

    Non-attracting Regions of Local Minima in Deep and Wide Neural Networks

    Full text link
    Understanding the loss surface of neural networks is essential for the design of models with predictable performance and their success in applications. Experimental results suggest that sufficiently deep and wide neural networks are not negatively impacted by suboptimal local minima. Despite recent progress, the reason for this outcome is not fully understood. Could deep networks have very few, if at all, suboptimal local optima? or could all of them be equally good? We provide a construction to show that suboptimal local minima (i.e., non-global ones), even though degenerate, exist for fully connected neural networks with sigmoid activation functions. The local minima obtained by our construction belong to a connected set of local solutions that can be escaped from via a non-increasing path on the loss curve. For extremely wide neural networks of decreasing width after the wide layer, we prove that every suboptimal local minimum belongs to such a connected set. This provides a partial explanation for the successful application of deep neural networks. In addition, we also characterize under what conditions the same construction leads to saddle points instead of local minima for deep neural networks

    Виявлення та відстежування об’єктів методами машинного навчання

    Get PDF
    Дипломна робота: 158 с., 83 рис., 6 табл., 2 додатки, 42 джерела. НЕЙРОННІ МЕРЕЖІ, ВИЯВЛЕННЯ ТА ВІДСТЕЖУВАННЯ ОБ’ЄКТІВ, ЗГОРТКОВІ НЕЙРОННІ МЕРЕЖІ, ПРОГРАМА ДЛЯ ВІДСТЕЖУВАННЯ. Об’єкт дослідження – виявлення та відстежування об’єктів з використанням периферійних пристроїв (різного роду відео-камер). Часто коли люди працюють з відео-матеріалами, вони стикаються з проблемою виявлення та класифікації об’єктів, які знаходяться на поточному кадрі. Це є необхідним у багатьох сферах людської діяльності. Наприклад, для створення автономної системи керування автомобілем, оскільки перш ніж штучний інтелект буде приймати рішення щодо керування авто має бути чітке розуміння з якими об’єктами він стикається у поточний момент часу. Однак може виникнути задача відстеження цілої історії об’єкта або об’єктів на відповідному відео матеріалі, точніше кажучи – траєкторії руху цілей, які були присутні на відео. Таким чином задача відстежування об’єктів є логічним продовженням попередньої задачі. Мета роботи – розробити програму з використанням існуючих моделей для виявлення та відстежування об’єктів, причому зробити це з оптимальним використанням ресурсів. Бажано щоб розроблена програма виконувала поставлену задачу в реальному часі, а також щоб модель була не занадто складною з точки зору часу опрацювання зображень аби можна було не витрачати додаткові ресурси на облаштування серверів, тобто щоб по суті уся робота виконувалася лише периферійними пристроями. Програмний продукт розроблено на мові програмування Python. Було реалізовано модель yolo в комбінації з алгоритмом DeepSort. Практичним результатом роботи є система виявлення і відстеження об'єктів.Bachelor thesis: 158 p., 83 fig., 6 tabl., 2 appendices, 42 sources. NEURAL NETWORKS, OBJECT DETECTION AND TRACKING, CONVOLUTIONAL NEURAL NETWORKS, TRACKING SOFTWARE. The object of research is the detection and tracking of objects using peripheral devices (various video cameras). Often, when people work with video materials, they face the problem of identifying and classifying objects that are in the current frame. This is necessary in many areas of human activity. For example, to create an autonomous car control system, because before artificial intelligence can make decisions about driving a car, there must be a clear understanding of what objects it encounters at the current moment in time. However, there may be a problem of tracking the entire history of an object or objects on the corresponding video material, more precisely, the trajectory of the targets that were present on the video. Thus, the object tracking task is a logical continuation of the previous task. The goal of the work is to develop a program using existing models for object detection and tracking, and do it with the optimal use of resources. It is desirable that the developed program performs the task in real time, and also that the model is not too complicated in terms of image processing time so that additional resources cannot be spent on setting up servers, i.e. that essentially all work is performed only by peripheral devices. The software product is developed in the Python programming language. The yolo model was implemented in combination with the DeepSort algorithm. The practical result of the work is a system of detection and tracking of objects
    corecore