69,763 research outputs found
A Multi-Gene Genetic Programming Application for Predicting Students Failure at School
Several efforts to predict student failure rate (SFR) at school accurately
still remains a core problem area faced by many in the educational sector. The
procedure for forecasting SFR are rigid and most often times require data
scaling or conversion into binary form such as is the case of the logistic
model which may lead to lose of information and effect size attenuation. Also,
the high number of factors, incomplete and unbalanced dataset, and black boxing
issues as in Artificial Neural Networks and Fuzzy logic systems exposes the
need for more efficient tools. Currently the application of Genetic Programming
(GP) holds great promises and has produced tremendous positive results in
different sectors. In this regard, this study developed GPSFARPS, a software
application to provide a robust solution to the prediction of SFR using an
evolutionary algorithm known as multi-gene genetic programming. The approach is
validated by feeding a testing data set to the evolved GP models. Result
obtained from GPSFARPS simulations show its unique ability to evolve a suitable
failure rate expression with a fast convergence at 30 generations from a
maximum specified generation of 500. The multi-gene system was also able to
minimize the evolved model expression and accurately predict student failure
rate using a subset of the original expressionComment: 14 pages, 9 figures, Journal paper. arXiv admin note: text overlap
with arXiv:1403.0623 by other author
Mining developer communication data streams
This paper explores the concepts of modelling a software development project
as a process that results in the creation of a continuous stream of data. In
terms of the Jazz repository used in this research, one aspect of that stream
of data would be developer communication. Such data can be used to create an
evolving social network characterized by a range of metrics. This paper
presents the application of data stream mining techniques to identify the most
useful metrics for predicting build outcomes. Results are presented from
applying the Hoeffding Tree classification method used in conjunction with the
Adaptive Sliding Window (ADWIN) method for detecting concept drift. The results
indicate that only a small number of the available metrics considered have any
significance for predicting the outcome of a build
Can the US Minimum Data Set Be Used for Predicting Admissions to Acute Care Facilities?
This paper is intended to give an overview of Knowledge Discovery in Large Datasets (KDD) and data mining applications in healthcare particularly as related to the Minimum Data Set, a resident assessment tool which is used in US long-term care facilities. The US Health Care Finance Administration, which mandates the use of this tool, has accumulated massive warehouses of MDS data. The pressure in healthcare to increase efficiency and effectiveness while improving patient outcomes requires that we find new ways to harness these vast resources. The intent of this preliminary study design paper is to discuss the development of an approach which utilizes the MDS, in conjunction with KDD and classification algorithms, in an attempt to predict admission from a long-term care facility to an acute care facility. The use of acute care services by long term care residents is a negative outcome, potentially avoidable, and expensive. The value of the MDS warehouse can be realized by the use of the stored data in ways that can improve patient outcomes and avoid the use of expensive acute care services. This study, when completed, will test whether the MDS warehouse can be used to describe patient outcomes and possibly be of predictive value
A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition
Predicting student attrition is an intriguing yet challenging problem for any academic institution. Class-imbalanced data is a common in the field of student retention, mainly because a lot of students register but fewer students drop out. Classification techniques for imbalanced dataset can yield deceivingly high
prediction accuracy where the overall predictive accuracy is usually driven by the majority class at the expense of having very poor performance on the crucial minority class. In this study, we compared different data balancing techniques to improve the predictive accuracy in minority class while maintaining satisfactory overall classification performance. Specifically, we tested three balancing techniquesāoversampling, under-sampling and synthetic minority over-sampling (SMOTE)āalong with four popular classification methodsālogistic regression, decision trees, neuron networks and support vector machines. We used a large and feature rich institutional student data (between the years 2005 and 2011) to assess the efficacy of both balancing techniques as well as prediction methods. The results indicated that the support vector machine combined with SMOTE data-balancing technique achieved the best classification performance with a 90.24% overall accuracy on the 10-fold holdout sample. All three data-balancing techniques improved the prediction accuracy for the minority class. Applying sensitivity analyses on developed models, we also identified the most important variables for accurate prediction of student attrition. Application of these models has the potential to accurately predict at-risk students and help reduce student dropout rates
- ā¦