48 research outputs found
Data Mining Module
Tato práce se zabývá problematikou získávání znalostí z databází (ZZD), a to zejména klasifikací pomocí Support Vector Machines (SVM). Na FIT VUT v Brně je vyvjíjen systém pro ZZD s modulární strukturou. Pro popis procesu dolování se používá jazyk DMSL. Cílem práce bylo rozšířit DMSL o potřeby SVM klasifikátoru, navrhnout, implementovat a otestovat modul pro tento systém.This thesis concerns knowledge discovery in databases (KDD), especially classification by Support Vector Machines (SVM). System for KDD has been developed at FIT BUT. For KDD process description is used language DMSL. The goal of the thesis was to extend DMSL with respect to SVM classifier, propose, implement and test a module for this system.
Recommended from our members
Are we meeting a deadline? classification goal achievement in time in the presence of imbalanced data
This paper addresses the problem of a finite set of entities which are required to achieve a goal within a predefined deadline. For example, a group of students is supposed to submit a homework by a specified cutoff. Further, we are interested in predicting which entities will achieve the goal within the deadline. The predictive models are built based only on the data from that population. The predictions are computed at various time instants by taking into account updated data about the entities. The first contribution of the paper is a formal description of the problem. The important characteristic of the proposed method for model building is the use of the properties of entities that have already achieved the goal. We call such an approach “Self-Learning”. Since typically only a few entities have achieved the goal at the beginning and their number gradually grows, the problem is inherently imbalanced. To mitigate the curse of imbalance, we improved the Self-Learning method by tackling information loss and by several sampling techniques. The original Self-Learning and the modifications have been evaluated in a case study for predicting submission of the first assessment in distance higher education courses. The results show that the proposed improvements outperform the specified two base-line models and the original Self-Learner, and also that the best results are achieved if domain-driven techniques are utilised to tackle the imbalance problem. We also showed that these improvements are statistically significant using Wilcoxon signed rank test
Ouroboros: early identification of at-risk students without models based on legacy data
This paper focuses on the problem of identifying students, who are at risk of failing their course. The presented method proposes a solution in the absence of data from previous courses, which are usually used for training machine learning models. This situation typically occurs in new courses. We present the concept of a "self-learner" that builds the machine learning models from the data generated during the current course. The approach utilises information about already submitted assessments, which introduces the problem of imbalanced data for training and testing the classification models.
There are three main contributions of this paper: (1) the concept of training the models for identifying at-risk students using data from the current course, (2) specifying the problem as a classification task, and (3) tackling the challenge of imbalanced data, which appears both in training and testing data.
The results show the comparison with the traditional approach of learning the models from the legacy course data, validating the proposed concept
Cluster Analysis Module of a Data Mining System
Tato diplomová práce pojednává o tvorbě shlukovacího modulu k vyvíjenému dolovacímu systému DataMiner na FIT VUT v Brně. V dolovacím systému chyběl modul pro shlukovou analýzu. Hlavním cílem práce bylo proto rozšířit systém o algoritmy shlukové analýzy. Společně se mnou na modulu pracoval Pavel Riedl. S ním jsme vytvořili společnou část pro všechny algoritmy tak, aby bylo možné systém snadno rozšířit o další shlukovací algoritmy. Sám jsem systém rozšířil o algoritmy založené na hustotě DBSCAN, OPTICS a DENCLUE. Ty byly implementovány a jejich funkčnost ověřena na vhodném vzorku dat.This thesis deals with the design and implementation of a cluster analysis module for currently developing datamining system DataMiner on FIT BUT. So far, the system lacked cluster analysis module. The main objective of the thesis was therefore to extend the system of such a module. Together with me, Pavel Riedl worked on the module. We have created a common part for all the algorithms so that the system can be easily extended to other clustering algorithms. In the second part, I extended the clustering module by adding three density based clustering aglorithms - DBSCAN, OPTICS and DENCLUE. Algorithms have been implemented and appropriate sample data was chosen to verify theirs functionality.
Recommended from our members
Analysing performance of first year engineering students
Many students in the engineering disciplines do not complete their higher education degree and drop out. This problem is serious, especially for first-year university students. In this paper, we analyse how students earn the credits required for their successful completion of the first study year. Using the example of a European technical university with traditional classroom-based education, we identify three groups of students: those who pass, those who earn only enough credits for staying in the program, and those who fail. Important patterns can be found at the end of the first semester. We present a simple algorithm that identifies students who may benefit from early additional support, which would increase their chances of progression to the second year and improve the retention improvement for the university. The results are evaluated in four consecutive academic years. The data from years 2013/14 and 2014/15 have been used to develop and verify the prediction model. In study years 2015/16 and 2016/17 the model has been applied to predict at-risk students, where the university tutors intervened and provided additional support and a significant improvement was achieved
Investigating Influence of Demographic Factors on Study Recommenders
Recommender systems in e-learning platforms, can utilise various data about learners in order to provide them with the next best material to study. We build on our previous work, which defines the recommendations in terms of two measures (i.e. relevance and effort) calculated from data of successful students in the previous runs of the courses. In this paper we investigate the impact of students’ socio-demographic factors and analyse how these factors improved the recommendation. It has been shown that education and age were found to have a significant impact on engagement with materials
Recommended from our members
Evaluating Weekly Predictions of At-Risk Students at The Open University: Results and Issues
Improving student retention rates is a critical task not only for traditional universities but particularly in distance learning courses, which are in recent years rapidly gaining in popularity. Early indications of potential student failure enable the tutor to provide the student with appropriate assistance, which might improve the student’s chances of passing the course. Collated results for a course cohort can also assist course teams to identify problem areas in the educational materials and make improvements for future course presentations.
Recent work at the Open University (OU) has focused on improving student retention by predicting which students are at risk of failing. In this paper we present the models implemented at the OU, evaluate these models on a selected course and discuss the issues of creating the predictive models based on historical data, particularly mapping the content of the current presentation to the previous one. These models were initially tested on two courses and later extended to ten courses
Measures for recommendations based on past students' activity
This paper introduces two measures for the recommendation of study materials based on students' past study activity. We use records from the Virtual Learning Environment (VLE) and analyse the activity of previous students. We assume that the activity of past students represents patterns, which can be used as a basis for recommendations to current students.The measures we define are Relevance, for description of a supposed VLE activity derived from previous students of the course, and Effort, that represents the actual effort of individual current students. Based on these measures, we propose a composite measure, which we call Importance.We use data from the previous course presentations to evaluate of the consistency of students' behaviour. We use correlation of the defined measures Relevance and Average Effort to evaluate the behaviour of two different student cohorts and the Root Mean Square Error to measure the deviation of Average Effort and individual student Effort
Recommended from our members
What do distance learning students seek from student analytics?
This study explores the perspectives of distance learners about student-facing learning analytics. Nineteen middle-aged, white online students answered eight forum questions about a hypothetical scenario of a student who struggles to balance work and study and who was given access to a learning analytics dashboard. The dashboard presented comparative performance and engagement information and personalised study
recommendations. Findings showed that study recommendations were highly favoured by students whereas peer comparisons were mostly viewed as not useful and demotivating
Recommended from our members
Visualisation of key splitting milestones to support interventions
The paper presents an approach to help staff responsible for running courses by identifying key milestones in the educational process, where the paths of successful and unsuccessful students started to split. By identifying these milestones in the already finished courses, this information can be used to plan the interventions in the next runs. This is achieved by finding the earliest time when the differences in behaviour or key performance metrics of unsuccessful students start to become significant. We demonstrate this approach in two case studies, one focused on a course level analysis and the latter on a whole academic year. This suggests its generic nature and possible applicability in various Learning Analytics scenarios