Search CORE

12 research outputs found

Efficient approaches for escaping higher order saddle points in non-convex optimization

Author: Anandkumar Anima
Ge Rong
Publication venue
Publication date: 18/02/2016
Field of study

Local search heuristics for non-convex optimizations are popular in applied machine learning. However, in general it is hard to guarantee that such algorithms even converge to a local minimum, due to the existence of complicated saddle point structures in high dimensions. Many functions have degenerate saddle points such that the first and second order derivatives cannot distinguish them with local optima. In this paper we use higher order derivatives to escape these saddle points: we design the first efficient algorithm guaranteed to converge to a third order local optimum (while existing techniques are at most second order). We also show that it is NP-hard to extend this further to finding fourth order local optima

arXiv.org e-Print Archive

eScholarship - University of California

On the Principle of Least Symmetry Breaking in Shallow ReLU Models

Author: Arjevani Yossi
Field Michael
Publication venue
Publication date: 03/10/2020
Field of study

We consider the optimization problem associated with fitting two-layer ReLU networks with respect to the squared loss, where labels are assumed to be generated by a target network. Focusing first on standard Gaussian inputs, we show that the structure of spurious local minima detected by stochastic gradient descent (SGD) is, in a well-defined sense, the \emph{least loss of symmetry} with respect to the target weights. A closer look at the analysis indicates that this principle of least symmetry breaking may apply to a broader range of settings. Motivated by this, we conduct a series of experiments which corroborate this hypothesis for different classes of non-isotropic non-product distributions, smooth activation functions and networks with a few layers

arXiv.org e-Print Archive

Levenberg-Marquardt Algorithm for Mackey-Glass Chaotic Time Series Prediction

Author: Junsheng Zhao
Xingfang Zhang
Xingjiang Yu
Yongmin Li
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

For decades, Mackey-Glass chaotic time series prediction has attracted more and more attention. When the multilayer perceptron is used to predict the Mackey-Glass chaotic time series, what we should do is to minimize the loss function. As is well known, the convergence speed of the loss function is rapid in the beginning of the learning process, while the convergence speed is very slow when the parameter is near to the minimum point. In order to overcome these problems, we introduce the Levenberg-Marquardt algorithm (LMA). Firstly, a rough introduction is given to the multilayer perceptron, including the structure and the model approximation method. Secondly, we introduce the LMA and discuss how to implement the LMA. Lastly, an illustrative example is carried out to show the prediction efficiency of the LMA. Simulations show that the LMA can give more accurate prediction than the gradient descent method

Crossref

Directory of Open Access Journals

Non-attracting Regions of Local Minima in Deep and Wide Neural Networks

Author: Petzka Henning
Sminchisescu Cristian
Publication venue
Publication date: 31/08/2020
Field of study

Understanding the loss surface of neural networks is essential for the design of models with predictable performance and their success in applications. Experimental results suggest that sufficiently deep and wide neural networks are not negatively impacted by suboptimal local minima. Despite recent progress, the reason for this outcome is not fully understood. Could deep networks have very few, if at all, suboptimal local optima? or could all of them be equally good? We provide a construction to show that suboptimal local minima (i.e., non-global ones), even though degenerate, exist for fully connected neural networks with sigmoid activation functions. The local minima obtained by our construction belong to a connected set of local solutions that can be escaped from via a non-increasing path on the loss curve. For extremely wide neural networks of decreasing width after the wide layer, we prove that every suboptimal local minimum belongs to such a connected set. This provides a partial explanation for the successful application of deep neural networks. In addition, we also characterize under what conditions the same construction leads to saddle points instead of local minima for deep neural networks

arXiv.org e-Print Archive

Lund University Publications

Виявлення та відстежування об’єктів методами машинного навчання

Author: Сухомлин Гліб Ігорович
Publication venue: Київ
Publication date: 01/01/2023
Field of study

Дипломна робота: 158 с., 83 рис., 6 табл., 2 додатки, 42 джерела. НЕЙРОННІ МЕРЕЖІ, ВИЯВЛЕННЯ ТА ВІДСТЕЖУВАННЯ ОБ’ЄКТІВ, ЗГОРТКОВІ НЕЙРОННІ МЕРЕЖІ, ПРОГРАМА ДЛЯ ВІДСТЕЖУВАННЯ. Об’єкт дослідження – виявлення та відстежування об’єктів з використанням периферійних пристроїв (різного роду відео-камер). Часто коли люди працюють з відео-матеріалами, вони стикаються з проблемою виявлення та класифікації об’єктів, які знаходяться на поточному кадрі. Це є необхідним у багатьох сферах людської діяльності. Наприклад, для створення автономної системи керування автомобілем, оскільки перш ніж штучний інтелект буде приймати рішення щодо керування авто має бути чітке розуміння з якими об’єктами він стикається у поточний момент часу. Однак може виникнути задача відстеження цілої історії об’єкта або об’єктів на відповідному відео матеріалі, точніше кажучи – траєкторії руху цілей, які були присутні на відео. Таким чином задача відстежування об’єктів є логічним продовженням попередньої задачі. Мета роботи – розробити програму з використанням існуючих моделей для виявлення та відстежування об’єктів, причому зробити це з оптимальним використанням ресурсів. Бажано щоб розроблена програма виконувала поставлену задачу в реальному часі, а також щоб модель була не занадто складною з точки зору часу опрацювання зображень аби можна було не витрачати додаткові ресурси на облаштування серверів, тобто щоб по суті уся робота виконувалася лише периферійними пристроями. Програмний продукт розроблено на мові програмування Python. Було реалізовано модель yolo в комбінації з алгоритмом DeepSort. Практичним результатом роботи є система виявлення і відстеження об'єктів.Bachelor thesis: 158 p., 83 fig., 6 tabl., 2 appendices, 42 sources. NEURAL NETWORKS, OBJECT DETECTION AND TRACKING, CONVOLUTIONAL NEURAL NETWORKS, TRACKING SOFTWARE. The object of research is the detection and tracking of objects using peripheral devices (various video cameras). Often, when people work with video materials, they face the problem of identifying and classifying objects that are in the current frame. This is necessary in many areas of human activity. For example, to create an autonomous car control system, because before artificial intelligence can make decisions about driving a car, there must be a clear understanding of what objects it encounters at the current moment in time. However, there may be a problem of tracking the entire history of an object or objects on the corresponding video material, more precisely, the trajectory of the targets that were present on the video. Thus, the object tracking task is a logical continuation of the previous task. The goal of the work is to develop a program using existing models for object detection and tracking, and do it with the optimal use of resources. It is desirable that the developed program performs the task in real time, and also that the model is not too complicated in terms of image processing time so that additional resources cannot be spent on setting up servers, i.e. that essentially all work is performed only by peripheral devices. The software product is developed in the Python programming language. The yolo model was implemented in combination with the DeepSort algorithm. The practical result of the work is a system of detection and tracking of objects

Electronic Archive of Kyiv Polytechnic Institute