4 research outputs found

    Stochastic Gradient Descent with Hyperbolic-Tangent Decay on Classification

    Full text link
    Learning rate scheduler has been a critical issue in the deep neural network training. Several schedulers and methods have been proposed, including step decay scheduler, adaptive method, cosine scheduler and cyclical scheduler. This paper proposes a new scheduling method, named hyperbolic-tangent decay (HTD). We run experiments on several benchmarks such as: ResNet, Wide ResNet and DenseNet for CIFAR-10 and CIFAR-100 datasets, LSTM for PAMAP2 dataset, ResNet on ImageNet and Fashion-MNIST datasets. In our experiments, HTD outperforms step decay and cosine scheduler in nearly all cases, while requiring less hyperparameters than step decay, and more flexible than cosine scheduler. Code is available at https://github.com/BIGBALLON/HTD.Comment: WACV201

    Implementation of Takagi Sugeno Kang Fuzzy with Rough Set Theory and Mini-Batch Gradient Descent Uniform Regularization

    Get PDF
    The Takagi Sugeno Kang (TSK) fuzzy approach is popular since its output is either a constant or a function. Parameter identification and structure identification are the two key requirements for building the TSK fuzzy system. The input utilized in fuzzy TSK can have an impact on the number of rules produced in such a way that employing more data dimensions typically results in more rules, which causes rule complexity. This issue can be solved by employing a dimension reduction technique that reduces the number of dimensions in the data. After that, the resulting rules are improved with MBGD (Mini-Batch Gradient Descent), which is then altered with uniform regularization (UR). UR can enhance the classifier's fuzzy TSK generalization performance. This study looks at how the rough sets method can be used to reduce data dimensions and use Mini Batch Gradient Descent Uniform Regularization (MBGD-UR) to optimize the rules that come from TSK. 252 respondents' body fat data were utilized as the input, and the mean absolute percentage error (MAPE) was used to analyze the results. Jupyter Notebook software and the Python programming language are used for data processing. The analysis revealed that the MAPE value was 37%, falling into the moderate area. Doi: 10.28991/ESJ-2023-07-03-09 Full Text: PD

    k-decay: A New Method For Learning Rate Schedule

    Full text link
    Recent work has shown that optimizing the learning rate (LR) schedule can be a very accurate and efficient way to train the deep neural networks. In this paper, we propose the k-decay method, in which the rate of change (ROC) of the LR is changed by its k-th order derivative, to obtain the new LR schedule. In the new LR schedule, a new hyper-parameter kk controls the change degree of LR, whereas the original method of kk at 1. By repeatedly using the k-decay method, one can identify the best LR schedule. We evaluate the k-decay method on CIFAR And ImageNet datasets with different neural networks (ResNet, Wide ResNet, and DenseNet). Our experiments show that the k-decay method can achieve improvements over the state-of-the-art results on most of them. The accuracy improved by 1.08% on the CIFAR-10 dataset, and by 2.07% on the CIFAR-100 dataset. On the ImageNet, accuracy improved by 1.25%. Our method is not only efficient but also easy to use

    Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints

    Full text link
    In most practical settings and theoretical analyses, one assumes that a model can be trained until convergence. However, the growing complexity of machine learning datasets and models may violate such assumptions. Indeed, current approaches for hyper-parameter tuning and neural architecture search tend to be limited by practical resource constraints. Therefore, we introduce a formal setting for studying training under the non-asymptotic, resource-constrained regime, i.e., budgeted training. We analyze the following problem: "given a dataset, algorithm, and fixed resource budget, what is the best achievable performance?" We focus on the number of optimization iterations as the representative resource. Under such a setting, we show that it is critical to adjust the learning rate schedule according to the given budget. Among budget-aware learning schedules, we find simple linear decay to be both robust and high-performing. We support our claim through extensive experiments with state-of-the-art models on ImageNet (image classification), Kinetics (video classification), MS COCO (object detection and instance segmentation), and Cityscapes (semantic segmentation). We also analyze our results and find that the key to a good schedule is budgeted convergence, a phenomenon whereby the gradient vanishes at the end of each allowed budget. We also revisit existing approaches for fast convergence and show that budget-aware learning schedules readily outperform such approaches under (the practical but under-explored) budgeted training setting.Comment: ICLR 2020. Project page with code is at http://www.cs.cmu.edu/~mengtial/proj/budgetnn
    corecore