Search CORE

2,482 research outputs found

Guided Stochastic Gradient Descent Algorithm for inconsistent datasets

Author: Sharma Anuraganand
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Stochastic Gradient Descent (SGD) Algorithm, despite its simplicity, is considered an effective and default standard optimization algorithm for machine learning classification models such as neural networks and logistic regression. However, SGD's gradient descent is biased towards the random selection of a data instance. In this paper, it has been termed as data inconsistency. The proposed variation of SGD, Guided Stochastic Gradient Descent (GSGD) Algorithm, tries to overcome this inconsistency in a given dataset through greedy selection of consistent data instances for gradient descent. The empirical test results show the efficacy of the method. Moreover, GSGD has also been incorporated and tested with other popular variations of SGD, such as Adam, Adagrad and Momentum. The guided search with GSGD achieves better convergence and classification accuracy in a limited time budget than its original counterpart of canonical and other variation of SGD. Additionally, it maintains the same efficiency when experimented on medical benchmark datasets with logistic regression for classification

University of the South Pacific Electronic Research Repository

Machine learning to analyze single-case data : a proof of concept

Author: Destras Océane
Giannakakos Antonia R.
Lanovaz Marc
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Visual analysis is the most commonly used method for interpreting data from singlecase designs, but levels of interrater agreement remain a concern. Although structured aids to visual analysis such as the dual-criteria (DC) method may increase interrater agreement, the accuracy of the analyses may still benefit from improvements. Thus, the purpose of our study was to (a) examine correspondence between visual analysis and models derived from different machine learning algorithms, and (b) compare the accuracy, Type I error rate and power of each of our models with those produced by the DC method. We trained our models on a previously published dataset and then conducted analyses on both nonsimulated and simulated graphs. All our models derived from machine learning algorithms matched the interpretation of the visual analysts more frequently than the DC method. Furthermore, the machine learning algorithms outperformed the DC method on accuracy, Type I error rate, and power. Our results support the somewhat unorthodox proposition that behavior analysts may use machine learning algorithms to supplement their visual analysis of single-case data, but more research is needed to examine the potential benefits and drawbacks of such an approach

PolyPublie

Dépôt Institutionnel Numérique

An AI-based Intelligent System for Healthcare Analysis Using Ridge–Adaline Stochastic Gradient Descent Classifier

Author: Baker T
Deepa N
Khan AM
Kumar Reddy PM
Prabadev B
Tariq U
Thippa Reddy G
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Recent technological advancements in information and communication technologies introduced smart ways of handling various aspects of life. Smart devices and applications are now an integral part of our daily life; however, the use of smart devices also introduced various physical and psychological health issues in modern societies. One of the most common health care issues prevalent among almost all age groups is diabetes mellitus. This work aims to propose an Artificial Intelligence (AI) – based intelligent system for earlier prediction of the disease using Ridge Adaline Stochastic Gradient Descent Classifier (RASGD). The proposed scheme RASGD improves the regularization of the classification model by using weight decay methods, namely Least Absolute Shrinkage and Selection Operator(LASSO) and Ridge Regression methods. To minimize the cost function of the classifier, the RASGD adopts an unconstrained optimization model. Further, to increase the convergence speed of the classifier, the Adaline Stochastic Gradient Descent classifier is integrated with Ridge Regression. Finally, to validate the effectiveness of the intelligent system, the results of the proposed scheme have been compared with state-of-art machine learning algorithms such as Support Vector Machine and Logistic Regression methods. The RASGD intelligent system attains an accuracy of 92%, which is better than the other selected classifiers

LJMU Research Online (Liverpool John Moores University)

CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks

Author: Blase Jennifer
Chu Xu
Li Peng
Rao Xi
Zhang Ce
Zhang Yue
Publication venue
Publication date: 01/01/2020
Field of study

Data quality affects machine learning (ML) model performances, and data scientists spend considerable amount of time on data cleaning before model training. However, to date, there does not exist a rigorous study on how exactly cleaning affects ML -- ML community usually focuses on developing ML algorithms that are robust to some particular noise types of certain distributions, while database (DB) community has been mostly studying the problem of data cleaning alone without considering how data is consumed by downstream ML analytics. We propose a CleanML study that systematically investigates the impact of data cleaning on ML classification tasks. The open-source and extensible CleanML study currently includes 14 real-world datasets with real errors, five common error types, seven different ML models, and multiple cleaning algorithms for each error type (including both commonly used algorithms in practice as well as state-of-the-art solutions in academic literature). We control the randomness in ML experiments using statistical hypothesis testing, and we also control false discovery rate in our experiments using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a systematic way to derive many interesting and nontrivial observations. We also put forward multiple research directions for researchers.Comment: published in ICDE 202

arXiv.org e-Print Archive

Repository for Publications and Research Data