8,529 research outputs found

    A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition

    Get PDF
    Predicting student attrition is an intriguing yet challenging problem for any academic institution. Class-imbalanced data is a common in the field of student retention, mainly because a lot of students register but fewer students drop out. Classification techniques for imbalanced dataset can yield deceivingly high prediction accuracy where the overall predictive accuracy is usually driven by the majority class at the expense of having very poor performance on the crucial minority class. In this study, we compared different data balancing techniques to improve the predictive accuracy in minority class while maintaining satisfactory overall classification performance. Specifically, we tested three balancing techniques—oversampling, under-sampling and synthetic minority over-sampling (SMOTE)—along with four popular classification methods—logistic regression, decision trees, neuron networks and support vector machines. We used a large and feature rich institutional student data (between the years 2005 and 2011) to assess the efficacy of both balancing techniques as well as prediction methods. The results indicated that the support vector machine combined with SMOTE data-balancing technique achieved the best classification performance with a 90.24% overall accuracy on the 10-fold holdout sample. All three data-balancing techniques improved the prediction accuracy for the minority class. Applying sensitivity analyses on developed models, we also identified the most important variables for accurate prediction of student attrition. Application of these models has the potential to accurately predict at-risk students and help reduce student dropout rates

    An insight into the experimental design for credit risk and corporate bankruptcy prediction systems

    Get PDF
    Over the last years, it has been observed an increasing interest of the finance and business communities in any application tool related to the prediction of credit and bankruptcy risk, probably due to the need of more robust decision-making systems capable of managing and analyzing complex data. As a result, plentiful techniques have been developed with the aim of producing accurate prediction models that are able to tackle these issues. However, the design of experiments to assess and compare these models has attracted little attention so far, even though it plays an important role in validating and supporting the theoretical evidence of performance. The experimental design should be done carefully for the results to hold significance; otherwise, it might be a potential source of misleading and contradictory conclusions about the benefits of using a particular prediction system. In this work, we review more than 140 papers published in refereed journals within the period 2000–2013, putting the emphasis on the bases of the experimental design in credit scoring and bankruptcy prediction applications. We provide some caveats and guidelines for the usage of databases, data splitting methods, performance evaluation metrics and hypothesis testing procedures in order to converge on a systematic, consistent validation standard.This work has partially been supported by the Mexican Science and Technology Council (CONACYT-Mexico) through a Postdoctoral Fellowship [223351], the Spanish Ministry of Economy under grant TIN2013-46522-P and the Generalitat Valenciana under grant PROMETEOII/2014/062

    Business failure research

    Get PDF
    In spite of a growing body of literature on business failures in China and effects of government policy, our understanding of the current state of knowledge remains unclear. The study advances research on the subject by developing the “four-parties” framework to review and synthesise the literature. The paper lays the groundwork for an integrated understanding of the causes and consequences of business failure. In sharp contrast with the evolution and development of Western-based business failure research, much of the literature on China and Chinese firms has focused largely on business failure prediction models by bypassing the traditional evolution from qualitative case study/story approaches to quantitative-based approaches. The study outlines the important implications and promising avenues for future research

    Business failure research

    Get PDF
    In spite of a growing body of literature on business failures in China and effects of government policy, our understanding of the current state of knowledge remains unclear. The study advances research on the subject by developing the “four-parties” framework to review and synthesise the literature. The paper lays the groundwork for an integrated understanding of the causes and consequences of business failure. In sharp contrast with the evolution and development of Western-based business failure research, much of the literature on China and Chinese firms has focused largely on business failure prediction models by bypassing the traditional evolution from qualitative case study/story approaches to quantitative-based approaches. The study outlines the important implications and promising avenues for future research

    AN INTEGRATED MULTIPLE STATISTICAL TECHNIQUE FOR PREDICTING POST-SECONDARY EDUCATIONAL DEGREE OUTCOMES BASED PRIMARILY ON VARIABLES AVAILABLE IN THE 8TH GRADE

    Get PDF
    There is a class of complex problems that may be too complicated to solve by any single analytical technique. Such problems involve so many measurements of interconnected factors that analysis with a single dimension technique may improve one aspect of the problem while overall achieving little or no improvement. This research examines the utility of modeling a complex problem with multiple statistical techniques integrated to analyze different types of data. The goal was to determine if this integrated approach was feasible and provided significantly better results than a single statistical technique. An application in engineering education was chosen because of the availability and comprehensiveness of the NELS:88 longitudinal dataset. This dataset provided a huge number of variables and 12,144 records of actual students progressing from 8th grade to their final educational outcomes 12 years later. The probability of earning a Science, Technology, Engineering, or Mathematics (STEM) degree is modeled using variables available in the 8th grade as well as standardized test scores. The variables include demographic, academic performance, and experiential measures. Extensive manipulation of the NELS:88 dataset was conducted to identify the student outcomes, prepare the set of covariates for modeling, and determine when the students' final outcome status occurred. The integrated models combined logistic regression, survival analysis, and Receiver Operating Characteristics (ROC) Curve analysis to predict obtaining a STEM degree vs. other outcomes. The results of the integrated models were compared to actual outcomes and the results of separate logistic regression models. Both sets of models provided good predictive accuracy. The feasibility of integrated models for complex problems was confirmed. The integrated approach provided less variability in incorrect STEM predictions, but the improvement was not statistically significant. The main contribution of this research is designing the integrated model approach and confirming its feasibility. Additional contributions include designing a process to create large multivariate logistic regression models; developing methods for extensive manipulation of a large dataset to adapt it for new analytical purposes; extending the application of logistic regression, survival analysis, and ROC Curve analysis within educational research; and creating a formal definition for STEM that can be statistically verified

    A Multiagent System For Web-Based Risk Management in Small and Medium Business

    Get PDF
    Business Intelligence has gained relevance during the last years to improve business decision making. However, there is still a growing need of developing innovative tools that can help small to medium sized enterprises to predict risky situations and manage inefficient activities. This article present a multiagent system especially conceived to detect risky situations and provide recommendations to the internal auditors of SMEs. The core of the multiagent system is a type of agent with advanced capacities for reasoning to make predictions based on previous experiences. This agent type is used to implement an evaluator agent specialized in detect risky situations and an advisor agent aimed at providing decision support facilities. Both agents incorporate innovative techniques in the stages of the CBR system. An initial prototype was developed and the results obtained related to small and medium enterprises in a real scenario are presented

    A multi-agent system for web-based risk management in small and medium business

    Get PDF
    Business Intelligence has gained relevance during the last years to improve business decision making. However, there is still a growing need of developing innovative tools that can help small to medium sized enterprises to predict risky situations and manage inefficient activities. This article present a multi-agent system especially created to detect risky situations and provide recommendations to the internal auditors of SMEs. The core of the multi-agent system is a type of agent with advanced capacities for reasoning to make predictions based on previous experiences. This agent type is used to implement a evaluator agent specialized in detect risky situations and an advisor agent aimed at providing decision support facilities. Both agents incorporate innovative techniques in the stages of the CBR system. An initial prototype was developed and the results obtained related to small and medium enterprises in a real scenario are presented
    corecore