18,460 research outputs found

    A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition

    Get PDF
    Predicting student attrition is an intriguing yet challenging problem for any academic institution. Class-imbalanced data is a common in the field of student retention, mainly because a lot of students register but fewer students drop out. Classification techniques for imbalanced dataset can yield deceivingly high prediction accuracy where the overall predictive accuracy is usually driven by the majority class at the expense of having very poor performance on the crucial minority class. In this study, we compared different data balancing techniques to improve the predictive accuracy in minority class while maintaining satisfactory overall classification performance. Specifically, we tested three balancing techniques—oversampling, under-sampling and synthetic minority over-sampling (SMOTE)—along with four popular classification methods—logistic regression, decision trees, neuron networks and support vector machines. We used a large and feature rich institutional student data (between the years 2005 and 2011) to assess the efficacy of both balancing techniques as well as prediction methods. The results indicated that the support vector machine combined with SMOTE data-balancing technique achieved the best classification performance with a 90.24% overall accuracy on the 10-fold holdout sample. All three data-balancing techniques improved the prediction accuracy for the minority class. Applying sensitivity analyses on developed models, we also identified the most important variables for accurate prediction of student attrition. Application of these models has the potential to accurately predict at-risk students and help reduce student dropout rates

    Network Data Mining: Methods and techniques for discovering deep linkage between attributes

    Full text link
    Abstract. Network Data Mining identifies emergent networks between myriads of individual data items and utilises special algorithms that aid visualisation of ‘emergent ’ patterns and trends in the linkage. It complements conventional data mining methods, which assume the independence between the attributes and the independence between the values of these attributes. These techniques typically flag, alert or alarm instances or events that could represent anomalous behaviour or irregularities because of a match with pre-defined patterns or rules. They serve as ‘exception detection ’ methods where the rules or definitions of what might constitute an exception are able to be known and specified ahead of time. Many problems are suited to this approach. Many problems however, especially those of a more complex nature, are not well suited. The rules or definitions simply cannot be specified. For example, in the analysis of transaction data there are no known suspicious transactions. This chapter presents a human-centred network data mining methodology that addresses the issues of depicting implicit relationships between data attributes and/or specific values of these attributes. A case study from the area of security illustrates the application of the methodology and corresponding data mining techniques. The chapter argues that for many problems, a ‘discovery’ phase in the investigative process based on visualisation and human cognition is a logical precedent to, and complement of, more automated ‘exception detection ’ phases

    Using Big Data for Predicting Freshmen Retention

    Get PDF
    Traditional research in student retention is survey-based, relying on data collected from questionnaires, which is not optimal for proactive prediction and real-time decision (student intervention) support. Machine learning approaches have their own limitations. Therefore, in this research, we propose a big data approach to formulating a predictive model. We used commonly available (student demographic and academic) data in academic institutions augmented by derived implicit social networks from students’ university smart card transactions. Furthermore, we applied a sequence learning method to infer students’ campus integration from their purchasing behaviors. Since student retention data is highly imbalanced, we built a new ensemble classifier to predict students at-risk of dropping out. For model evaluation, we use a real-world dataset of smart card transactions from a large educational institution. The experimental results show that the addition of campus integration and social behavior features refined using the ensemble method significantly improve prediction accuracy and recall

    Discovering Big Data Modelling for Educational World

    Get PDF
    AbstractWith the advancement in internet technology all over the world, the demand for online education is growing. Many educational institutions are offering various types of online courses and e-content. The analytical models from data mining and computer science heuristics help in analysis and visualization of data, predicting student performance, generating recommendations for students as well as teachers, providing feedback to students, identifying related courses, e-content and books, detecting undesirable student behaviours, developing course contents and in planning various other educational activities. Today many educational institutions are using data analytics for improving the services they provide. The data access patterns about students, logged and collected from online educational learning systems could be explored to find informative relationships in the educational world. But a major concern is that the data are exploding, as numbers of students and courses are increasing day by day all over the world. The usage of Big Data platforms and parallel programming models like MapReduce may accelerate the analysis of exploding educational data and computational pattern finding capability. The paper focuses on trial of educational modelling based on Big Data techniques

    BUILDING DSS USING KNOWLEDGE DISCOVERY IN DATABASE APPLIED TO ADMISSION & REGISTRATION FUNCTIONS

    Get PDF
    This research investigates the practical issues surrounding the development and implementation of Decision Support Systems (DSS). The research describes the traditional development approaches analyzing their drawbacks and introduces a new DSS development methodology. The proposed DSS methodology is based upon four modules; needs' analysis, data warehouse (DW), knowledge discovery in database (KDD), and a DSS module. The proposed DSS methodology is applied to and evaluated using the admission and registration functions in Egyptian Universities. The research investigates the organizational requirements that are required to underpin these functions in Egyptian Universities. These requirements have been identified following an in-depth survey of the recruitment process in the Egyptian Universities. This survey employed a multi-part admission and registration DSS questionnaire (ARDSSQ) to identify the required data sources together with the likely users and their information needs. The questionnaire was sent to senior managers within the Egyptian Universities (both private and government) with responsibility for student recruitment, in particular admission and registration. Further, access to a large database has allowed the evaluation of the practical suitability of using a data warehouse structure and knowledge management tools within the decision making framework. 1600 students' records have been analyzed to explore the KDD process, and another 2000 records have been used to build and test the data mining techniques within the KDD process. Moreover, the research has analyzed the key characteristics of data warehouses and explored the advantages and disadvantages of such data structures. This evaluation has been used to build a data warehouse for the Egyptian Universities that handle their admission and registration related archival data. The decision makers' potential benefits of the data warehouse within the student recruitment process will be explored. The design of the proposed admission and registration DSS (ARDSS) will be developed and tested using Cool: Gen (5.0) CASE tools by Computer Associates (CA), connected to a MSSQL Server (6.5), in a Windows NT (4.0) environment. Crystal Reports (4.6) by Seagate will be used as a report generation tool. CLUST AN Graphics (5.0) by CLUST AN software will also be used as a clustering package. Finally, the contribution of this research is found in the following areas: A new DSS development methodology; The development and validation of a new research questionnaire (i.e. ARDSSQ); The development of the admission and registration data warehouse; The evaluation and use of cluster analysis proximities and techniques in the KDD process to find knowledge in the students' records; And the development of the ARDSS software that encompasses the advantages of the KDD and DW and submitting these advantages to the senior admission and registration managers in the Egyptian Universities. The ARDSS software could be adjusted for usage in different countries for the same purpose, it is also scalable to handle new decision situations and can be integrated with other systems

    A Bibliometric Study on Learning Analytics

    Get PDF
    Learning analytics tools and techniques are continually developed and published in scholarly discourse. This study aims at examining the intellectual structure of the Learning Analytics domain by collecting and analyzing empirical articles on Learning Analytics for the period of 2004-2018. First, bibliometric analysis and citation analyses of 2730 documents from Scopus identified the top authors, key research affiliations, leading publication sources (journals and conferences), and research themes of the learning analytics domain. Second, Domain Analysis (DA) techniques were used to investigate the intellectual structures of learning analytics research, publication, organization, and communication (Hjørland & Bourdieu 2014). The software of VOSviewer is used to analyze the relationship by publication: historical and institutional; author and institutional relationships and the dissemination of Learning Analytics knowledge. The results of this study showed that Learning Analytics had captured the attention of the global community. The United States, Spain, and the United Kingdom are among the leading countries contributing to the dissemination of learning analytics knowledge. The leading publication sources are ACM International Conference Proceeding Series, and Lecture Notes in Computer Science. The intellectual structures of the learning analytics domain are presented in this study the LA research taxonomy can be re-used by teachers, administrators, and other stakeholders to support the teaching and learning environments in a higher education institution

    Longitudinal study of first-time freshmen using data mining

    Get PDF
    In the modern world, higher education is transitioning from enrollment mode to recruitment mode. This shift paved the way for institutional research and policy making from historical data perspective. More and more universities in the U.S. are implementing and using enterprise resource planning (ERP) systems, which collect vast amounts of data. Although few researchers have used data mining for performance, graduation rates, and persistence prediction, research is sparse in this area, and it lacks the rigorous development and evaluation of data mining models. The primary objective of this research was to build and analyze data mining models using historical data to find out patterns and rules that classified students who were likely to drop-out and students who were likely to persist.;Student retention is a major problem for higher education institutions, and predictive models developed using traditional quantitative methods do not produce results with high accuracy, because of massive amounts of data, correlation between attributes, missing values, and non-linearity of variables; however, data mining techniques work well with these conditions. In this study, various data mining models were used along with discretization, feature subset selection, and cross-validation; the results were not only analyzed using the probability of detection and probability of false alarm, but were also analyzed using variances obtained in these performance measures. Attributes were grouped together based on the current hypotheses in the literature. Using the results of feature subset selectors and treatment learners, attributes that contributed the most toward a student\u27s decision of dropping out or staying were found, and specific rules were found that characterized a successful student. The performance measures obtained in this study were significantly better than previously reported in the literature

    Emerging technologies for learning report (volume 3)

    Get PDF
    corecore