Search CORE

157 research outputs found

Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult

Author: Tao Molei
Wang Yuqing
Xu Zhenghao
Zhao Tuo
Publication venue
Publication date: 11/12/2023
Field of study

Large learning rates, when applied to gradient descent for nonconvex optimization, yield various implicit biases including the edge of stability (Cohen et al., 2021), balancing (Wang et al., 2022), and catapult (Lewkowycz et al., 2020). These phenomena cannot be well explained by classical optimization theory. Though significant theoretical progress has been made in understanding these implicit biases, it remains unclear for which objective functions would they be more likely. This paper provides an initial step in answering this question and also shows that these implicit biases are in fact various tips of the same iceberg. To establish these results, we develop a global convergence theory under large learning rates, for a family of nonconvex functions without globally Lipschitz continuous gradient, which was typically assumed in existing convergence analysis. Specifically, these phenomena are more likely to occur when the optimization objective function has good regularity. This regularity, together with gradient descent using a large learning rate that favors flatter regions, results in these nontrivial dynamical behaviors. Another corollary is the first non-asymptotic convergence rate bound for large-learning-rate gradient descent optimization of nonconvex functions. Although our theory only applies to specific functions so far, the possibility of extrapolating it to neural networks is also experimentally validated, for which different choices of loss, activation functions, and other techniques such as batch normalization can all affect regularity significantly and lead to very different training dynamics

arXiv.org e-Print Archive

Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process with Uncertainty Quantification

Author: Li Zichong
Mei Yajun
Xu Qunzhi
Xu Zhenghao
Zha Hongyuan
Zhao Tuo
Publication venue
Publication date: 24/10/2023
Field of study

Spatio-temporal point processes (STPPs) are potent mathematical tools for modeling and predicting events with both temporal and spatial features. Despite their versatility, most existing methods for learning STPPs either assume a restricted form of the spatio-temporal distribution, or suffer from inaccurate approximations of the intractable integral in the likelihood training objective. These issues typically arise from the normalization term of the probability density function. Moreover, current techniques fail to provide uncertainty quantification for model predictions, such as confidence intervals for the predicted event's arrival time and confidence regions for the event's location, which is crucial given the considerable randomness of the data. To tackle these challenges, we introduce SMASH: a Score MAtching-based pSeudolikeliHood estimator for learning marked STPPs with uncertainty quantification. Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of marked STPPs through score-matching and offers uncertainty quantification for the predicted event time, location and mark by computing confidence regions over the generated samples. The superior performance of our proposed framework is demonstrated through extensive experiments in both event prediction and uncertainty quantification

arXiv.org e-Print Archive

Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds

Author: Chen Minshuo
Ji Xiang
Wang Mengdi
Xu Zhenghao
Zhao Tuo
Publication venue
Publication date: 25/09/2023
Field of study

Policy-based algorithms equipped with deep neural networks have achieved great success in solving high-dimensional policy optimization problems in reinforcement learning. However, current analyses cannot explain why they are resistant to the curse of dimensionality. In this work, we study the sample complexity of the neural policy mirror descent (NPMD) algorithm with convolutional neural networks (CNN) as function approximators. Motivated by the empirical observation that many high-dimensional environments have state spaces possessing low-dimensional structures, such as those taking images as states, we consider the state space to be a

d

-dimensional manifold embedded in the

D

-dimensional Euclidean space with intrinsic dimension

d\ll D

. We show that in each iteration of NPMD, both the value function and the policy can be well approximated by CNNs. The approximation errors are controlled by the size of the networks, and the smoothness of the previous networks can be inherited. As a result, by properly choosing the network size and hyperparameters, NPMD can find an

\epsilon

-optimal policy with

\widetilde{O}(\epsilon^{-\frac{d}{\alpha}-2})

samples in expectation, where

\alpha\in(0,1]

indicates the smoothness of environment. Compared to previous work, our result exhibits that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality, providing an explanation for the efficacy of deep policy-based algorithms

arXiv.org e-Print Archive

Subband Independent Component Analysis for Coherence Enhancement

Author: Cvetkovic Zoran
Guo Zhenghao
McClelland Verity
Rosenzweig Ivana
Rosenzweig Jan
Xu Yuhang
Publication venue
Publication date: 27/02/2024
Field of study

Objective: Cortico-muscular coherence (CMC) is becoming a common technique for detection and characterization of functional coupling between the motor cortex and muscle activity. It is typically evaluated between surface electromyogram (sEMG) and electroencephalogram (EEG) signals collected synchronously during controlled movement tasks. However, the presence of noise and activities unrelated to observed motor tasks in sEMG and EEG results in low CMC levels, which often makes functional coupling difficult to detect. Methods: In this paper, we introduce Coherent Subband Independent Component Analysis (CoSICA) to enhance synchronous cortico-muscular components in mixtures captured by sEMG and EEG. The methodology relies on filter bank processing to decompose sEMG and EEG signals into frequency bands. Then, it applies independent component analysis along with a component selection algorithm for re-synthesis of sEMG and EEG designed to maximize CMC levels. Results: We demonstrate the effectiveness of the proposed method in increasing CMC levels across different signal-to-noise ratios first using simulated data. Using neurophysiological data, we then illustrate that CoSICA processing achieves a pronounced enhancement of original CMC. Conclusion: Our findings suggest that the proposed technique provides an effective framework for improving coherence detection. Significance: The proposed methodologies will eventually contribute to understanding of movement control and has high potential for translation into clinical practice

Coventry University Pure Portal

Subband Independent Component Analysis for Coherence Enhancement

Author: Cvetkovic Zoran
Guo Zhenghao
McClelland Verity M.
Rosenzweig Ivana
Rosenzweig Jan
Xu Yuhang
Publication venue
Publication date: 14/02/2024
Field of study

Objective: Cortico-muscular coherence (CMC) is becoming a common technique for detection and characterization of functional coupling between the motor cortex and muscle activity. It is typically evaluated between surface electromyo- gram (sEMG) and electroencephalogram (EEG) signals collected synchronously during controlled movement tasks. However, the presence of noise and activities unrelated to observed motor tasks in sEMG and EEG results in low CMC levels, which often makes functional coupling difficult to detect. Methods: In this paper, we introduce Coherent Subband Independent Component Analysis (CoSICA) to enhance synchronous cortico-muscular components in mixtures captured by sEMG and EEG. The methodology relies on filter bank processing to decompose sEMG and EEG signals into frequency bands. Then, it applies independent component analysis along with a component selection algorithm for re- synthesis of sEMG and EEG designed to maximize CMC levels. Results: We demonstrate the effectiveness of the proposed method in increasing CMC levels across different signal-to-noise ratios first using simulated data. Using neurophysiological data, we then illustrate that CoSICA processing achieves a pronounced enhancement of original CMC. Conclusion: Our findings suggest that the proposed technique provides an effective framework for improving coherence detection. Significance: The proposed methodologies will eventually contribute to understanding of movement control and has high potential for translation into clinical practice

King's Research Portal