2,482 research outputs found
Guided Stochastic Gradient Descent Algorithm for inconsistent datasets
Stochastic Gradient Descent (SGD) Algorithm, despite its simplicity, is considered an effective and default standard optimization algorithm for machine learning classification models such as neural networks and logistic regression. However, SGD's gradient descent is biased towards the random selection of a data instance. In this paper, it has been termed as data inconsistency. The proposed variation of SGD, Guided Stochastic Gradient Descent (GSGD) Algorithm, tries to overcome this inconsistency in a given dataset through greedy selection of consistent data instances for gradient descent. The empirical test results show the efficacy of the method. Moreover, GSGD has also been incorporated and tested with other popular variations of SGD, such as Adam, Adagrad and Momentum. The guided search with GSGD achieves better convergence and classification accuracy in a limited time budget than its original counterpart of canonical and other variation of SGD. Additionally, it maintains the same efficiency when experimented on medical benchmark datasets with logistic regression for classification
Machine learning to analyze single-case data : a proof of concept
Visual analysis is the most commonly used method for interpreting data from singlecase designs, but levels of interrater agreement remain a concern. Although structured
aids to visual analysis such as the dual-criteria (DC) method may increase interrater
agreement, the accuracy of the analyses may still benefit from improvements. Thus, the
purpose of our study was to (a) examine correspondence between visual analysis and
models derived from different machine learning algorithms, and (b) compare the
accuracy, Type I error rate and power of each of our models with those produced by
the DC method. We trained our models on a previously published dataset and then
conducted analyses on both nonsimulated and simulated graphs. All our models
derived from machine learning algorithms matched the interpretation of the visual
analysts more frequently than the DC method. Furthermore, the machine learning
algorithms outperformed the DC method on accuracy, Type I error rate, and power.
Our results support the somewhat unorthodox proposition that behavior analysts may
use machine learning algorithms to supplement their visual analysis of single-case data,
but more research is needed to examine the potential benefits and drawbacks of such an
approach
An AI-based Intelligent System for Healthcare Analysis Using Ridge–Adaline Stochastic Gradient Descent Classifier
Recent technological advancements in information and communication technologies introduced smart ways of handling various aspects of life. Smart devices and applications are now an integral part of our daily life; however, the use of smart devices also introduced various physical and psychological health issues in modern societies. One of the most common health care issues prevalent among almost all age groups is diabetes mellitus. This work aims to propose an Artificial Intelligence (AI) – based intelligent system for earlier prediction of the disease using Ridge Adaline Stochastic Gradient Descent Classifier (RASGD). The proposed scheme RASGD improves the regularization of the classification model by using weight decay methods, namely Least Absolute Shrinkage and Selection Operator(LASSO) and Ridge Regression methods. To minimize the cost function of the classifier, the RASGD adopts an unconstrained optimization model. Further, to increase the convergence speed of the classifier, the Adaline Stochastic Gradient Descent classifier is integrated with Ridge Regression. Finally, to validate the effectiveness of the intelligent system, the results of the proposed scheme have been compared with state-of-art machine learning algorithms such as Support Vector Machine and Logistic Regression methods. The RASGD intelligent system attains an accuracy of 92%, which is better than the other selected classifiers
CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks
Data quality affects machine learning (ML) model performances, and data
scientists spend considerable amount of time on data cleaning before model
training. However, to date, there does not exist a rigorous study on how
exactly cleaning affects ML -- ML community usually focuses on developing ML
algorithms that are robust to some particular noise types of certain
distributions, while database (DB) community has been mostly studying the
problem of data cleaning alone without considering how data is consumed by
downstream ML analytics. We propose a CleanML study that systematically
investigates the impact of data cleaning on ML classification tasks. The
open-source and extensible CleanML study currently includes 14 real-world
datasets with real errors, five common error types, seven different ML models,
and multiple cleaning algorithms for each error type (including both commonly
used algorithms in practice as well as state-of-the-art solutions in academic
literature). We control the randomness in ML experiments using statistical
hypothesis testing, and we also control false discovery rate in our experiments
using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a
systematic way to derive many interesting and nontrivial observations. We also
put forward multiple research directions for researchers.Comment: published in ICDE 202
- …