Search CORE

5 research outputs found

Optimal Hyperparameter $\epsilon$ for Adaptive Stochastic Optimizers through Gradient Histograms

Author: Rodriguez Paul
Silva Gustavo
Publication venue
Publication date: 19/11/2023
Field of study

Optimizers are essential components for successfully training deep neural network models. In order to achieve the best performance from such models, designers need to carefully choose the optimizer hyperparameters. However, this can be a computationally expensive and time-consuming process. Although it is known that all optimizer hyperparameters must be tuned for maximum performance, there is still a lack of clarity regarding the individual influence of minor priority hyperparameters, including the safeguard factor

\epsilon

and momentum factor

\beta

, in leading adaptive optimizers (specifically, those based on the Adam optimizers). In this manuscript, we introduce a new framework based on gradient histograms to analyze and justify important attributes of adaptive optimizers, such as their optimal performance and the relationships and dependencies among hyperparameters. Furthermore, we propose a novel gradient histogram-based algorithm that automatically estimates a reduced and accurate search space for the safeguard hyperparameter

\epsilon

, where the optimal value can be easily found

arXiv.org e-Print Archive

Machine Learning Methods to Estimate Whole-Brain Effective Connectome for ASD Identification

Author: Zhuang Juntang
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/04/2022
Field of study

Functional Magnetic Resonance Imaging (fMRI) is widely used to study neural-developmental diseases such as Autism Spectrum Disorder (ASD). There are mainly two types of connectome to analyze fMRI: the Functional Connectome (FC) and the Effective Connectome (EC). FC is typically derived as the correlation between fMRI time-series from different brain regions, while EC is derived by fitting the measurement time-series to the Dynamical Causal Model (DCM) described by a system of Ordinary Differential Equations (ODEs). FC is typically easier to compute yet can not reveal the causal relations among brain regions; EC reveals the causal relations yet is much harder to compute and is more sensitive to observation noise. Therefore, this dissertation aims to propose a generic framework for estimation of EC, and identify ASD from fMRI based on EC. First, we propose the Model Driven Learning Framework (MDL) for parameter estimation in the continuous models. MDL iteratively performs three steps: 1) forward simulation according to prior knowledge of the model, 2) backward pass to derive the gradient of parameters, 3) update of parameters based on gradient information. We derive various methods to solve each step in MDL. Specifically, for step 2), we identify the inaccuracy of existing gradient estimation methods for continuous time models (e.g. ODEs): the adjoint method has numerical errors in reverse-mode integration; the naive method suffers from a redundantly deep computation graph. We propose a series of new methods which guarantee the numerical accuracy with a low memory cost. For step 3), we propose the AdaBelief optimizer, which is a generic first-order adaptive optimizer that simultaneously achieves fast convergence, good generalization and training stability. Furthermore, we show that an asynchronous version of AdaBelief achieves provably weaker convergence condition and faster convergence rate. We show that our MDL significantly accelerates the fitting of DCM and estimation of EC. To deal with the limited data and improve generalization of the classifier, we propose the Surrogate Gap Guided Sharpness-Aware Minimization (GSAM). GSAM is based on the observation that poor generalization often comes with a sharp loss surface of the model, and improves generalization by jointly minimizing the training loss and the curvature of the loss surface. Finally, we apply the proposed MDL to estimate whole-brain EC for fMRI, and performed group comparison to identify FC and EC edges that are related to ASD. Next, we apply the estimated EC for the identification of ASD. Specifically, we conducted experiments with both resting-state fMRI and task fMRI data, and compare the predictive power of FC and EC in both cases. Furthermore, we apply GSAM to further improve the generalization performance, which significantly improves the classification performance and reduces the dominant eigenvalue of the Hessian of the network. In summary, we apply the proposed framework for effective connectome analysis, and improve the identification of ASD from fMRI data

Yale University