7 research outputs found

    ANALISIS RANDOM FOREST PADA KLASIFIKASI CART KETIDAKTEPATAN WAKTU KELULUSAN MAHASISWA UNIVERSITAS TERBUKA

    Get PDF
    Classification and Regression Tree (CART) is one of the classification methods that are popularly used in various fields. The method is considered capable of dealing with various data conditions. However, the CART method has weaknesses in the classification tree prediction, which is less stable in changes in learning data which will cause major changes in the results of the classification tree prediction. Improving the predictions of the CART classification tree, an ensemble random forest method was developed that combines many classification trees to improve stability and determine classification predictions. This study aims to improve CART predictive stability and accuracy with Random Forest. The case used in this study is the classification of inaccuracies in Open University student graduation. The results of the analysis show that random forest is able to increase the accuracy of the classification of the inaccuracy of student graduation that reaches convergence with the prediction of classification reaching 93.23%

    Methods to Improve the Prediction Accuracy and Performance of Ensemble Models

    Get PDF
    The application of ensemble predictive models has been an important research area in predicting medical diagnostics, engineering diagnostics, and other related smart devices and related technologies. Most of the current predictive models are complex and not reliable despite numerous efforts in the past by the research community. The performance accuracy of the predictive models have not always been realised due to many factors such as complexity and class imbalance. Therefore there is a need to improve the predictive accuracy of current ensemble models and to enhance their applications and reliability and non-visual predictive tools. The research work presented in this thesis has adopted a pragmatic phased approach to propose and develop new ensemble models using multiple methods and validated the methods through rigorous testing and implementation in different phases. The first phase comprises of empirical investigations on standalone and ensemble algorithms that were carried out to ascertain their performance effects on complexity and simplicity of the classifiers. The second phase comprises of an improved ensemble model based on the integration of Extended Kalman Filter (EKF), Radial Basis Function Network (RBFN) and AdaBoost algorithms. The third phase comprises of an extended model based on early stop concepts, AdaBoost algorithm, and statistical performance of the training samples to minimize overfitting performance of the proposed model. The fourth phase comprises of an enhanced analytical multivariate logistic regression predictive model developed to minimize the complexity and improve prediction accuracy of logistic regression model. To facilitate the practical application of the proposed models; an ensemble non-invasive analytical tool is proposed and developed. The tool links the gap between theoretical concepts and practical application of theories to predict breast cancer survivability. The empirical findings suggested that: (1) increasing the complexity and topology of algorithms does not necessarily lead to a better algorithmic performance, (2) boosting by resampling performs slightly better than boosting by reweighting, (3) the prediction accuracy of the proposed ensemble EKF-RBFN-AdaBoost model performed better than several established ensemble models, (4) the proposed early stopped model converges faster and minimizes overfitting better compare with other models, (5) the proposed multivariate logistic regression concept minimizes the complexity models (6) the performance of the proposed analytical non-invasive tool performed comparatively better than many of the benchmark analytical tools used in predicting breast cancers and diabetics ailments. The research contributions to ensemble practice are: (1) the integration and development of EKF, RBFN and AdaBoost algorithms as an ensemble model, (2) the development and validation of ensemble model based on early stop concepts, AdaBoost, and statistical concepts of the training samples, (3) the development and validation of predictive logistic regression model based on breast cancer, and (4) the development and validation of a non-invasive breast cancer analytic tools based on the proposed and developed predictive models in this thesis. To validate prediction accuracy of ensemble models, in this thesis the proposed models were applied in modelling breast cancer survivability and diabetics’ diagnostic tasks. In comparison with other established models the simulation results of the models showed improved predictive accuracy. The research outlines the benefits of the proposed models, whilst proposes new directions for future work that could further extend and improve the proposed models discussed in this thesis

    Defining Problematic School Absenteeism: Identifying Youth at Risk

    Full text link
    Study 1: School attendance is an important foundational competency for children and adolescents, and school absenteeism has been linked to myriad short- and long-term negative consequences, even into adulthood. Many efforts have been made to conceptualize and address this population across various categories and dimensions of functioning and across multiple disciplines, resulting in both a rich literature base and a splintered view regarding this population. This article (Part 1 of 2) reviews and critiques key categorical and dimensional approaches to conceptualizing school attendance and school absenteeism, with an eye toward reconciling these approaches (Part 2 of 2) to develop a roadmap for preventative and intervention strategies, early warning systems and nimble response, global policy review, dissemination and implementation, and adaptations to future changes in education and technology. This article sets the stage for a discussion of a multidimensional, multi-tiered system of supports pyramid model as a heuristic framework for conceptualizing the manifold aspects of school attendance and school absenteeism. Study 2: School attendance problems, including school absenteeism, are common to many students worldwide, and frameworks to better understand these heterogeneous students include multiple classes or tiers of intertwined risk factors as well as interventions. Recent studies have iii thus examined risk factors at varying levels of absenteeism severity to demarcate distinctions among these tiers. Prior studies in this regard have focused more on demographic and academic variables and less on family environment risk factors that are endemic to this population. The present study utilized ensemble and classification and regression tree analysis to identify potential family environment risk factors among youth (i.e., children and adolescents) at different levels of school absenteeism severity (i.e., 1 + %, 3 + %, 5 + %, 10 + %). Higher levels of absenteeism were also examined on an exploratory basis. Participants included 341 youth aged 5–17 years (M = 12.2; SD = 3.3) and their families from an outpatient therapy clinic (68.3%) and community (31.7%) setting, the latter from a family court and truancy diversion program cohort. Family environment risk factors tended to be more circumscribed and informative at higher levels of absenteeism, with greater diversity at lower levels. Higher levels of absenteeism appear more closely related to lower achievement orientation, active-recreational orientation, cohesion, and expressiveness, though several nuanced results were found as well. Absenteeism severity levels of 10–15% may be associated more with qualitative changes in family functioning. These data may support a Tier 2-Tier 3 distinction in this regard and may indicate the need for specific family-based intervention goals at higher levels of absenteeism severity. Study 3: School attendance problems are highly prevalent worldwide, leading researchers to investigate many different risk factors for this population. Of considerable controversy is how internalizing behavior problems might help to distinguish different types of youth with school attendance problems. In addition, efforts are ongoing to identify the point at which children and adolescents move from appropriate school attendance to problematic school absenteeism. The iv present study utilized ensemble and classification and regression tree analysis to identify potential internalizing behavior risk factors among youth at different levels of school absenteeism severity (i.e., 1+%, 3+%, 5+%, 10+%). Higher levels of absenteeism were also examined on an exploratory basis. Participants included 160 youth aged 6–19 years (M = 13.7; SD = 2.9) and their families from an outpatient therapy clinic (39.4%) and community (60.6%) setting, the latter from a family court and truancy diversion program cohort. One particular item relating to lack of enjoyment was most predictive of absenteeism severity at different levels, though not among the highest levels. Other internalizing items were also predictive of various levels of absenteeism severity, but only in a negatively endorsed fashion. Internalizing symptoms of worry and fatigue tended to be endorsed higher across less severe and more severe absenteeism severity levels. A general expectation that predictors would tend to be more homogeneous at higher than lower levels of absenteeism severity was not generally supported. The results help confirm the difficulty of conceptualizing this population based on forms of behavior but may support the need for early warning sign screening for youth at risk for school attendance problems

    Ensemble methods of classification for power systems security assessment

    Get PDF
    One of the most promising approaches for complex technical systems analysis employs ensemble methods of classification. Ensemble methods enable a reliable decision rules construction for feature space classification in the presence of many possible states of the system. In this paper the novel techniques based on decision trees are used to evaluate power system reliability. In this work a hybrid approach based on random forests models and boosting model is proposed. Such techniques can be applied to predict the interaction of increasing renewable power, storage devices and intelligent switching of smart loads from intelligent domestic appliances, storage heaters and air-conditioning units and electric vehicles with grid to enhance decision making. This ensemble classification method was tested on the modified 118-bus IEEE power system to examine whether the power system is secured under steady-state operating conditions. Keywords: Power system, Ensemble methods, Boosting, Classification, Heuristics, Random forests, Security assessment, 2010 MSC: 90C59, 68T0
    corecore