3 research outputs found

    Item to Skills Mapping: Deriving a Conjunctive Q-matrix from Data

    Full text link

    Empirical Means to Validate Skills Models and Assess the Fit of a Student Model

    Get PDF
    RÉSUMÉ Dans le domaine de l’analytique des données éducationnelles, ou dans le domaine de l’apprentissage automatique en général, un analyste qui souhaite construire un modèle de classification ou de régression avec un ensemble de données est confronté à un très grand nombre de choix. Les techniques d’apprentissage automatique offrent de nos jours la possibilité de créer des modèles d’une complexité toujours plus grande grâce à de nouvelles techniques d’apprentissage. Parallèlement à ces nouvelles possibilités vient la question abordée dans cette thèse : comment décider lesquels des modèles sont plus représentatifs de la réalité sous-jacente ? La pratique courante est de construire différents modèles et d’utiliser celui qui offre la meilleure prédiction comme le meilleur modèle. Toutefois, la performance du modèle varie généralement avec des facteurs tels que la taille de l’échantillon, la distribution de la variable ciblée, l’entropie des prédicteurs, le bruit, les valeurs manquantes, etc. Par exemple, la capacité d’adaptation d’un modèle au bruit et sa capacité à faire face à la petite taille de l’échantillon peut donner de meilleures performances que le modèle sous-jacent pour un ensemble de données. Par conséquent, le meilleur modèle peut ne pas être le plus représentatif de la réalité, mais peut être le résultat de facteurs contextuels qui rendent celui-ci meilleur que le modèle sous-jacent. Nous étudions la question de l’évaluation de modèles différents à partir de données synthétiques en définissant un espace vectoriel des performances de ceux-ci, et nous utilisons une l’approche du plus proches voisins avec une distance de corrélation pour identifier le modèle sous-jacent. Cette approche est basée sur les définitions et les procédures suivantes. Soit un ensemble de modèles, M, et un vecteur p de longueur jMj qui contient la performance de chaque modèle sur un ensemble de données. Ce vecteur représente un point qui caractérise l’ensemble de données dans l’espace de performance. Pour chaque modèle M dans M, nous déterminons un point pi dans l’espace de performance qui correspond à des données synthétiques générées par le modèle Mi. Puis, pour un ensemble de données, nous trouvons le point pi le plus proche, en utilisant la corrélation comme distance, et considérons le modèle Mi l’ayant généré comme le modèle sous-jacent. Les résultats montrent que, pour les ensembles de données synthétiques, leurs ensembles de modèles sous-jacents sont généralement plus souvent correctement identifiés par l’approche proposée plutôt que par le modèle avec la meilleure performance. Ils montrent aussi que les modèles sémantiquement similaires sont également plus rapprochés dans l’espace de performance que les modèles qui sont basés sur des concepts très différents.----------ABSTRACT In educational data mining, or in data mining in general, analysts that wish to build a classification or a regression model over new and unknown data are faced with a very wide span of choices. Machine learning techniques nowadays offer the possibility to learn and train a large and an ever growing variety of models from data. Along with this increased display of models that can be defined and trained from data, comes the question addressed in this thesis: how to decide which are the most representative of the underlying ground truth? The standard practice is to train different models, and consider the one with the highest predictive performance as the best fit. However, model performance typically varies along factors such as sample size, target variable and predictor entropy, noise, missing values, etc. For example, a model’s resilience to noise and ability to deal with small sample size may yield better performance than the ground truth model for a given data set. Therefore, the best performer may not be the model that is most representative of the ground truth, but instead it may be the result of contextual factors that make this model outperform the ground truth one. We investigate the question of assessing different model fits using synthetic data by defining a vector space of model performances, and use a nearest neighbor approach with a correlation distance to identify the ground truth model. This approach is based on the following definitions and procedure. Consider a set of models,M, and a vector p of length jMj that contains the performance of each model over a given data set. This vector represents a point that characterizes the data set in the performance space. For each model M 2M, we determine a new point in the performance space that corresponds to synthetic data generated with model M. Then, for a given data set, we find the nearest synthetic data set point, using correlation as a distance, and consider the model behind it to be the ground truth. The results show that, for synthetic data sets, their underlying model sets are generally more often correctly identified with the proposed approach than by using the best performer approach. They also show that semantically similar models are also closer together in the performance space than the models that are based on highly different concepts

    Learning Path Construction in e-Learning – What to Learn and How to Learn?

    Get PDF
    Whether in traditional or e learning, it is important to consider: what to learn, how to learn, and how well students have learned. Since there are various types of students with different learning preferences, learning styles, and learning abilities, it is not easy to provide the best learning approach for a specific student. Designing learning contents for different students is very time consuming and tedious for teachers. No matter how the learning process is carried out, both teachers and students must be satisfied with students’ learning performance. Therefore, it is important to provide helpful teaching and learning guidance for teachers and students. In order to achieve this, we proposed a fined-grained outcome-based learning path model, which allows teachers to explicitly formulate learning activities as the learning units of a learning path. This allows teachers to formulate the assessment criteria related to the subject-specific knowledge and skills as well as generic skills, so that the pedagogy could be defined and properly incorporated. Apart from defining the pedagogical approaches, we also need to provide tailored learning contents of the courses, so that different types of students can better learn the knowledge according to their own learning abilities, knowledge backgrounds, etc. On the other hand, those learning contents should be well-structured, so that students can understand them. To achieve this, we have proposed a learning path generation method based on Association Link Network to automatically identify the relationships among different Web resources. This method makes use of the Web resources that can be freely obtained from the Web to form well-structured learning resources with proper sequences for delivery. Although the learning path defines what to learn and how to learn, we still needed to monitor student learning progress in order to determine proper learning contents and learning activities in an e-Learning system. To address the problem, we proposed the use of student progress indicators based on Fuzzy Cognitive Map to analyze both performance and non-performance attributes and their causal relationships. The aim is to help teachers improve their teaching approaches and help students reflect their strengths and weaknesses in learning. . This research focuses on the intelligent tutoring e-Learning system, which provides an intelligent approach to design and delivery learning activities in a learning path. Many experiments and comparative studies on both teachers and students have been carried out in order to evaluate the research of this PhD thesis. The results show that our research can effectively help teachers generate high quality learning paths, help students improve their learning performance, and offer both teachers and students a better understanding on student learning progress
    corecore