4,022 research outputs found

    Graduate Catalog Center for Computer and Information Sciences

    Get PDF

    Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets

    Get PDF
    Research into the combination of data mining and machine learning technology with web-based education systems (known as education data mining, or EDM) is becoming imperative in order to enhance the quality of education by moving beyond traditional methods. With the worldwide growth of the Information Communication Technology (ICT), data are becoming available at a significantly large volume, with high velocity and extensive variety. In this thesis, four popular data mining methods are applied to Apache Spark, using large volumes of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The thesis convincingly presents useful strategies for allocating computing resources and tuning to take full advantage of the in-memory system of Apache Spark to conduct the tasks of data mining and machine learning. Moreover, it offers insights that education experts and data scientists can use to manage and improve the quality of education, as well as to analyze and discover hidden knowledge in the era of big data

    A Predictive Model using Machine Learning Algorithm in Identifying Student's Probability on Passing Semestral Course

    Get PDF
    Purpose: The used of an integrated academic information system in higher education has been proven in improving quality education which results to generates enormous data that can be used to discover new knowledge through data mining concepts, techniques, and machine learning algorithm. This study aims to determine a predictive model to learn students' probability to pass their courses taken at the earliest stage of the semester. Method: To successfully discover a good predictive model with high acceptability, accurate, and precision rate which delivers a useful outcome for decision making in education systems, in improving the processes of conveying knowledge and uplifting student's academic performance, the proponent applies and strictly followed the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology. This study employs classification for data mining techniques, and decision tree for algorithm. Results: With the utilization of the newly discovered predictive model, the prediction of students' probabilities to pass the current courses they take gives 0.7619 accuracy, 0.8333 precision, 0.8823 recall, and 0.8571 f1 score, which shows that the model used in the prediction is reliable, accurate, and recommendable. Conclusion: Considering the indicators and the results, it can be noted that the prediction model used in this study is highly acceptable. The data mining techniques provides effective and efficient innovative tools in analyzing and predicting student performances. The model used in this study will greatly affect the way educators understand and identify the weakness of their students in the class, the way they improved the effectiveness of their learning processes gearing to their students, bring down academic failure rates, and help institution administrators modify their learning system outcomes. Recommendations: Full automation of prediction results accessible by the students, faculty, and institution administrators for fast management decision making should take place. Further study for the inclusion of some student`s demographic information, vast amount of data within the dataset, automated and manual process of predictive criteria indicators where the students can regulate to which criteria, they must improve more for them to pass their courses taken at the end of the semester as early as midterm period are highly needed

    Master of Science Degree Programs 1998

    Get PDF
    • …
    corecore