2,482 research outputs found

    Guided Stochastic Gradient Descent Algorithm for inconsistent datasets

    Get PDF
    Stochastic Gradient Descent (SGD) Algorithm, despite its simplicity, is considered an effective and default standard optimization algorithm for machine learning classification models such as neural networks and logistic regression. However, SGD's gradient descent is biased towards the random selection of a data instance. In this paper, it has been termed as data inconsistency. The proposed variation of SGD, Guided Stochastic Gradient Descent (GSGD) Algorithm, tries to overcome this inconsistency in a given dataset through greedy selection of consistent data instances for gradient descent. The empirical test results show the efficacy of the method. Moreover, GSGD has also been incorporated and tested with other popular variations of SGD, such as Adam, Adagrad and Momentum. The guided search with GSGD achieves better convergence and classification accuracy in a limited time budget than its original counterpart of canonical and other variation of SGD. Additionally, it maintains the same efficiency when experimented on medical benchmark datasets with logistic regression for classification

    Machine learning to analyze single-case data : a proof of concept

    Get PDF
    Visual analysis is the most commonly used method for interpreting data from singlecase designs, but levels of interrater agreement remain a concern. Although structured aids to visual analysis such as the dual-criteria (DC) method may increase interrater agreement, the accuracy of the analyses may still benefit from improvements. Thus, the purpose of our study was to (a) examine correspondence between visual analysis and models derived from different machine learning algorithms, and (b) compare the accuracy, Type I error rate and power of each of our models with those produced by the DC method. We trained our models on a previously published dataset and then conducted analyses on both nonsimulated and simulated graphs. All our models derived from machine learning algorithms matched the interpretation of the visual analysts more frequently than the DC method. Furthermore, the machine learning algorithms outperformed the DC method on accuracy, Type I error rate, and power. Our results support the somewhat unorthodox proposition that behavior analysts may use machine learning algorithms to supplement their visual analysis of single-case data, but more research is needed to examine the potential benefits and drawbacks of such an approach

    An AI-based Intelligent System for Healthcare Analysis Using Ridge–Adaline Stochastic Gradient Descent Classifier

    Get PDF
    Recent technological advancements in information and communication technologies introduced smart ways of handling various aspects of life. Smart devices and applications are now an integral part of our daily life; however, the use of smart devices also introduced various physical and psychological health issues in modern societies. One of the most common health care issues prevalent among almost all age groups is diabetes mellitus. This work aims to propose an Artificial Intelligence (AI) – based intelligent system for earlier prediction of the disease using Ridge Adaline Stochastic Gradient Descent Classifier (RASGD). The proposed scheme RASGD improves the regularization of the classification model by using weight decay methods, namely Least Absolute Shrinkage and Selection Operator(LASSO) and Ridge Regression methods. To minimize the cost function of the classifier, the RASGD adopts an unconstrained optimization model. Further, to increase the convergence speed of the classifier, the Adaline Stochastic Gradient Descent classifier is integrated with Ridge Regression. Finally, to validate the effectiveness of the intelligent system, the results of the proposed scheme have been compared with state-of-art machine learning algorithms such as Support Vector Machine and Logistic Regression methods. The RASGD intelligent system attains an accuracy of 92%, which is better than the other selected classifiers

    CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks

    Full text link
    Data quality affects machine learning (ML) model performances, and data scientists spend considerable amount of time on data cleaning before model training. However, to date, there does not exist a rigorous study on how exactly cleaning affects ML -- ML community usually focuses on developing ML algorithms that are robust to some particular noise types of certain distributions, while database (DB) community has been mostly studying the problem of data cleaning alone without considering how data is consumed by downstream ML analytics. We propose a CleanML study that systematically investigates the impact of data cleaning on ML classification tasks. The open-source and extensible CleanML study currently includes 14 real-world datasets with real errors, five common error types, seven different ML models, and multiple cleaning algorithms for each error type (including both commonly used algorithms in practice as well as state-of-the-art solutions in academic literature). We control the randomness in ML experiments using statistical hypothesis testing, and we also control false discovery rate in our experiments using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a systematic way to derive many interesting and nontrivial observations. We also put forward multiple research directions for researchers.Comment: published in ICDE 202
    • …
    corecore