4 research outputs found
Stochastic Gradient Descent with Hyperbolic-Tangent Decay on Classification
Learning rate scheduler has been a critical issue in the deep neural network
training. Several schedulers and methods have been proposed, including step
decay scheduler, adaptive method, cosine scheduler and cyclical scheduler. This
paper proposes a new scheduling method, named hyperbolic-tangent decay (HTD).
We run experiments on several benchmarks such as: ResNet, Wide ResNet and
DenseNet for CIFAR-10 and CIFAR-100 datasets, LSTM for PAMAP2 dataset, ResNet
on ImageNet and Fashion-MNIST datasets. In our experiments, HTD outperforms
step decay and cosine scheduler in nearly all cases, while requiring less
hyperparameters than step decay, and more flexible than cosine scheduler. Code
is available at https://github.com/BIGBALLON/HTD.Comment: WACV201
Implementation of Takagi Sugeno Kang Fuzzy with Rough Set Theory and Mini-Batch Gradient Descent Uniform Regularization
The Takagi Sugeno Kang (TSK) fuzzy approach is popular since its output is either a constant or a function. Parameter identification and structure identification are the two key requirements for building the TSK fuzzy system. The input utilized in fuzzy TSK can have an impact on the number of rules produced in such a way that employing more data dimensions typically results in more rules, which causes rule complexity. This issue can be solved by employing a dimension reduction technique that reduces the number of dimensions in the data. After that, the resulting rules are improved with MBGD (Mini-Batch Gradient Descent), which is then altered with uniform regularization (UR). UR can enhance the classifier's fuzzy TSK generalization performance. This study looks at how the rough sets method can be used to reduce data dimensions and use Mini Batch Gradient Descent Uniform Regularization (MBGD-UR) to optimize the rules that come from TSK. 252 respondents' body fat data were utilized as the input, and the mean absolute percentage error (MAPE) was used to analyze the results. Jupyter Notebook software and the Python programming language are used for data processing. The analysis revealed that the MAPE value was 37%, falling into the moderate area. Doi: 10.28991/ESJ-2023-07-03-09 Full Text: PD
k-decay: A New Method For Learning Rate Schedule
Recent work has shown that optimizing the learning rate (LR) schedule can be
a very accurate and efficient way to train the deep neural networks. In this
paper, we propose the k-decay method, in which the rate of change (ROC) of the
LR is changed by its k-th order derivative, to obtain the new LR schedule. In
the new LR schedule, a new hyper-parameter controls the change degree of
LR, whereas the original method of at 1. By repeatedly using the k-decay
method, one can identify the best LR schedule. We evaluate the k-decay method
on CIFAR And ImageNet datasets with different neural networks (ResNet, Wide
ResNet, and DenseNet). Our experiments show that the k-decay method can achieve
improvements over the state-of-the-art results on most of them. The accuracy
improved by 1.08% on the CIFAR-10 dataset, and by 2.07% on the CIFAR-100
dataset. On the ImageNet, accuracy improved by 1.25%. Our method is not only
efficient but also easy to use
Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints
In most practical settings and theoretical analyses, one assumes that a model
can be trained until convergence. However, the growing complexity of machine
learning datasets and models may violate such assumptions. Indeed, current
approaches for hyper-parameter tuning and neural architecture search tend to be
limited by practical resource constraints. Therefore, we introduce a formal
setting for studying training under the non-asymptotic, resource-constrained
regime, i.e., budgeted training. We analyze the following problem: "given a
dataset, algorithm, and fixed resource budget, what is the best achievable
performance?" We focus on the number of optimization iterations as the
representative resource. Under such a setting, we show that it is critical to
adjust the learning rate schedule according to the given budget. Among
budget-aware learning schedules, we find simple linear decay to be both robust
and high-performing. We support our claim through extensive experiments with
state-of-the-art models on ImageNet (image classification), Kinetics (video
classification), MS COCO (object detection and instance segmentation), and
Cityscapes (semantic segmentation). We also analyze our results and find that
the key to a good schedule is budgeted convergence, a phenomenon whereby the
gradient vanishes at the end of each allowed budget. We also revisit existing
approaches for fast convergence and show that budget-aware learning schedules
readily outperform such approaches under (the practical but under-explored)
budgeted training setting.Comment: ICLR 2020. Project page with code is at
http://www.cs.cmu.edu/~mengtial/proj/budgetnn