Search CORE

69,763 research outputs found

A Multi-Gene Genetic Programming Application for Predicting Students Failure at School

Author: Eke B. O.
Orove J. O.
Osegi N. E.
Publication venue
Publication date: 11/03/2015
Field of study

Several efforts to predict student failure rate (SFR) at school accurately still remains a core problem area faced by many in the educational sector. The procedure for forecasting SFR are rigid and most often times require data scaling or conversion into binary form such as is the case of the logistic model which may lead to lose of information and effect size attenuation. Also, the high number of factors, incomplete and unbalanced dataset, and black boxing issues as in Artificial Neural Networks and Fuzzy logic systems exposes the need for more efficient tools. Currently the application of Genetic Programming (GP) holds great promises and has produced tremendous positive results in different sectors. In this regard, this study developed GPSFARPS, a software application to provide a robust solution to the prediction of SFR using an evolutionary algorithm known as multi-gene genetic programming. The approach is validated by feeding a testing data set to the evolved GP models. Result obtained from GPSFARPS simulations show its unique ability to evolve a suitable failure rate expression with a fast convergence at 30 generations from a maximum specified generation of 500. The multi-gene system was also able to minimize the evolved model expression and accurately predict student failure rate using a subset of the original expressionComment: 14 pages, 9 figures, Journal paper. arXiv admin note: text overlap with arXiv:1403.0623 by other author

arXiv.org e-Print Archive

CiteSeerX

Mining developer communication data streams

Author: Connor Andy M.
Finlay Jacqui
Pears Russel
Publication venue
Publication date: 22/07/2014
Field of study

This paper explores the concepts of modelling a software development project as a process that results in the creation of a continuous stream of data. In terms of the Jazz repository used in this research, one aspect of that stream of data would be developer communication. Such data can be used to create an evolving social network characterized by a range of metrics. This paper presents the application of data stream mining techniques to identify the most useful metrics for predicting build outcomes. Results are presented from applying the Hoeffding Tree classification method used in conjunction with the Adaptive Sliding Window (ADWIN) method for detecting concept drift. The results indicate that only a small number of the available metrics considered have any significance for predicting the outcome of a build

arXiv.org e-Print Archive

Crossref

Can the US Minimum Data Set Be Used for Predicting Admissions to Acute Care Facilities?

Author: Abbott Patricia A.
Quirolgico Stephen
Manchand Roopak
Canfield Kip
Adya Monica
Publication venue: e-Publications@Marquette
Publication date: 01/01/1998
Field of study

This paper is intended to give an overview of Knowledge Discovery in Large Datasets (KDD) and data mining applications in healthcare particularly as related to the Minimum Data Set, a resident assessment tool which is used in US long-term care facilities. The US Health Care Finance Administration, which mandates the use of this tool, has accumulated massive warehouses of MDS data. The pressure in healthcare to increase efficiency and effectiveness while improving patient outcomes requires that we find new ways to harness these vast resources. The intent of this preliminary study design paper is to discuss the development of an approach which utilizes the MDS, in conjunction with KDD and classification algorithms, in an attempt to predict admission from a long-term care facility to an acute care facility. The use of acute care services by long term care residents is a negative outcome, potentially avoidable, and expensive. The value of the MDS warehouse can be realized by the use of the stored data in ways that can improve patient outcomes and avoid the use of expensive acute care services. This study, when completed, will test whether the MDS warehouse can be used to describe patient outcomes and possibly be of predictive value

epublications@Marquette

Saint Louis University Libraries Digital Collections

A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition

Author: Delen Dursun
Kasap Nihat
Meesad Phayung
Thammasiri Dech
Publication venue: 'Elsevier BV'
Publication date: 01/08/2013
Field of study

Predicting student attrition is an intriguing yet challenging problem for any academic institution. Class-imbalanced data is a common in the field of student retention, mainly because a lot of students register but fewer students drop out. Classification techniques for imbalanced dataset can yield deceivingly high prediction accuracy where the overall predictive accuracy is usually driven by the majority class at the expense of having very poor performance on the crucial minority class. In this study, we compared different data balancing techniques to improve the predictive accuracy in minority class while maintaining satisfactory overall classification performance. Specifically, we tested three balancing techniques—oversampling, under-sampling and synthetic minority over-sampling (SMOTE)—along with four popular classification methods—logistic regression, decision trees, neuron networks and support vector machines. We used a large and feature rich institutional student data (between the years 2005 and 2011) to assess the efficacy of both balancing techniques as well as prediction methods. The results indicated that the support vector machine combined with SMOTE data-balancing technique achieved the best classification performance with a 90.24% overall accuracy on the 10-fold holdout sample. All three data-balancing techniques improved the prediction accuracy for the minority class. Applying sensitivity analyses on developed models, we also identified the most important variables for accurate prediction of student attrition. Application of these models has the potential to accurately predict at-risk students and help reduce student dropout rates

Sabanci University Research Database

Using machine learning techniques to develop forecasting algorithms for postoperative complications: Protocol for a retrospective study

Author: Avidan Michael Simon
Ben Abdallah Arbi
Budelier Thaddeus
Chen Yixin
Fritz Bradley A
Gregory Stephen
Helsten Daniel L
Kronzer Alex
McKinnon Sherry Lynn
Murray-Torres Teresa M
Sharma Anshuman
Wildes Troy S
Publication venue: Digital Commons@Becker
Publication date: 01/01/2018
Field of study

Digital Commons@Becker