Search CORE

83 research outputs found

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

Author: Chaudhari Pratik
Soatto Stefano
Publication venue
Publication date: 16/01/2018
Field of study

Stochastic gradient descent (SGD) is widely believed to perform implicit regularization when used to train deep neural networks, but the precise manner in which this occurs has thus far been elusive. We prove that SGD minimizes an average potential over the posterior distribution of weights along with an entropic regularization term. This potential is however not the original loss function in general. So SGD does perform variational inference, but for a different loss than the one used to compute the gradients. Even more surprisingly, SGD does not even converge in the classical sense: we show that the most likely trajectories of SGD for deep networks do not behave like Brownian motion around critical points. Instead, they resemble closed loops with deterministic components. We prove that such "out-of-equilibrium" behavior is a consequence of highly non-isotropic gradient noise in SGD; the covariance matrix of mini-batch gradients for deep networks has a rank as small as 1% of its dimension. We provide extensive empirical validation of these claims, proven in the appendix

arXiv.org e-Print Archive

Crossref

A geometric interpretation of stochastic gradient descent using diffusion metrics

Author: Chaudhari P.
Fioresi R.
Soatto S.
Publication venue: 'MDPI AG'
Publication date: 27/10/2019
Field of study

This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Machine Learning Technique Based Fake News Detection

Author: Noori Sheak Rashed Haider
Ria Nushrat Jahan
Sutradhar Biplob Kumar
Zonaid Md.
Publication venue
Publication date: 18/09/2023
Field of study

False news has received attention from both the general public and the scholarly world. Such false information has the ability to affect public perception, giving nefarious groups the chance to influence the results of public events like elections. Anyone can share fake news or facts about anyone or anything for their personal gain or to cause someone trouble. Also, information varies depending on the part of the world it is shared on. Thus, in this paper, we have trained a model to classify fake and true news by utilizing the 1876 news data from our collected dataset. We have preprocessed the data to get clean and filtered texts by following the Natural Language Processing approaches. Our research conducts 3 popular Machine Learning (Stochastic gradient descent, Na\"ive Bayes, Logistic Regression,) and 2 Deep Learning (Long-Short Term Memory, ASGD Weight-Dropped LSTM, or AWD-LSTM) algorithms. After we have found our best Naive Bayes classifier with 56% accuracy and an F1-macro score of an average of 32%

arXiv.org e-Print Archive

FORECASTING MISSING DATA USING DIFFERENT METHODS FOR ROAD MAINTAINERS

Author: Pekša Jānis
Publication venue: 'Rezekne Academy of Technologies'
Publication date: 20/06/2019
Field of study

Observations collected from meteorological stations that are available to road maintainers and used for experimental purposes in this paper. Unfortunately, these observations are insufficient to make good forecasting that is needed for road maintainers. Those meteorological stations are located next to the road surface in the territory of the Republic of Latvia. The road maintainers can make forecasting using this data what is needed for the winter months. It is up to the road maintainers in winter months to process decision-making on road surface smudging with anti-slip chemical materials. The missing data in each meteorological station exists from time to time. The paper represents the possibility of using several approaches to fill out these missing data. This process is needed to be more accurate in predicting specific parameters aggregated from meteorological stations. These approaches are compared between the three closest meteorological stations available in the Republic of Latvia. The relevant data are for the winter months of 2017-2018. To conclude which is more accurate with VAS "Latvijas valsts celi" data set

Journals of Rezekne Academy of Technologies

The Scientific Journal of Rezeknes Augstskola