5 research outputs found

    RiPLE: Recommendation in Peer-Learning Environments Based on Knowledge Gaps and Interests

    Full text link
    Various forms of Peer-Learning Environments are increasingly being used in post-secondary education, often to help build repositories of student generated learning objects. However, large classes can result in an extensive repository, which can make it more challenging for students to search for suitable objects that both reflect their interests and address their knowledge gaps. Recommender Systems for Technology Enhanced Learning (RecSysTEL) offer a potential solution to this problem by providing sophisticated filtering techniques to help students to find the resources that they need in a timely manner. Here, a new RecSysTEL for Recommendation in Peer-Learning Environments (RiPLE) is presented. The approach uses a collaborative filtering algorithm based upon matrix factorization to create personalized recommendations for individual students that address their interests and their current knowledge gaps. The approach is validated using both synthetic and real data sets. The results are promising, indicating RiPLE is able to provide sensible personalized recommendations for both regular and cold-start users under reasonable assumptions about parameters and user behavior.Comment: 25 pages, 7 figures. The paper is accepted for publication in the Journal of Educational Data Minin

    A Matrix Factorization Method for Mapping Items to Skills and for Enhancing Expert-Based Q-matrices

    No full text
    Abstract. Uncovering the right skills behind question items is a difficult task. It requires a thorough understanding of the subject matter and of the cognitive factors that determine student performance. The skills definition, and the mapping of item to skills, require the involvement of experts. We investigate means to assist experts for this task by using a data driven, matrix factorization approach. The two mappings of items to skills, the expert on one side and the matrix factorization on the other, are compared in terms of discrepancies, and in terms of their performance when used in a linear model of skills assessment and item outcome prediction. Visual analysis shows a relatively similar pattern between the expert and the factorized mappings, although differences arise. The prediction comparison shows the factorization approach performs slightly better than the original expert Q-matrix, giving supporting evidence to the belief that the factorization mapping is valid. Implications for the use of the factorization to design better item to skills mapping are discussed

    Empirical Means to Validate Skills Models and Assess the Fit of a Student Model

    Get PDF
    RÉSUMÉ Dans le domaine de l’analytique des données éducationnelles, ou dans le domaine de l’apprentissage automatique en général, un analyste qui souhaite construire un modèle de classification ou de régression avec un ensemble de données est confronté à un très grand nombre de choix. Les techniques d’apprentissage automatique offrent de nos jours la possibilité de créer des modèles d’une complexité toujours plus grande grâce à de nouvelles techniques d’apprentissage. Parallèlement à ces nouvelles possibilités vient la question abordée dans cette thèse : comment décider lesquels des modèles sont plus représentatifs de la réalité sous-jacente ? La pratique courante est de construire différents modèles et d’utiliser celui qui offre la meilleure prédiction comme le meilleur modèle. Toutefois, la performance du modèle varie généralement avec des facteurs tels que la taille de l’échantillon, la distribution de la variable ciblée, l’entropie des prédicteurs, le bruit, les valeurs manquantes, etc. Par exemple, la capacité d’adaptation d’un modèle au bruit et sa capacité à faire face à la petite taille de l’échantillon peut donner de meilleures performances que le modèle sous-jacent pour un ensemble de données. Par conséquent, le meilleur modèle peut ne pas être le plus représentatif de la réalité, mais peut être le résultat de facteurs contextuels qui rendent celui-ci meilleur que le modèle sous-jacent. Nous étudions la question de l’évaluation de modèles différents à partir de données synthétiques en définissant un espace vectoriel des performances de ceux-ci, et nous utilisons une l’approche du plus proches voisins avec une distance de corrélation pour identifier le modèle sous-jacent. Cette approche est basée sur les définitions et les procédures suivantes. Soit un ensemble de modèles, M, et un vecteur p de longueur jMj qui contient la performance de chaque modèle sur un ensemble de données. Ce vecteur représente un point qui caractérise l’ensemble de données dans l’espace de performance. Pour chaque modèle M dans M, nous déterminons un point pi dans l’espace de performance qui correspond à des données synthétiques générées par le modèle Mi. Puis, pour un ensemble de données, nous trouvons le point pi le plus proche, en utilisant la corrélation comme distance, et considérons le modèle Mi l’ayant généré comme le modèle sous-jacent. Les résultats montrent que, pour les ensembles de données synthétiques, leurs ensembles de modèles sous-jacents sont généralement plus souvent correctement identifiés par l’approche proposée plutôt que par le modèle avec la meilleure performance. Ils montrent aussi que les modèles sémantiquement similaires sont également plus rapprochés dans l’espace de performance que les modèles qui sont basés sur des concepts très différents.----------ABSTRACT In educational data mining, or in data mining in general, analysts that wish to build a classification or a regression model over new and unknown data are faced with a very wide span of choices. Machine learning techniques nowadays offer the possibility to learn and train a large and an ever growing variety of models from data. Along with this increased display of models that can be defined and trained from data, comes the question addressed in this thesis: how to decide which are the most representative of the underlying ground truth? The standard practice is to train different models, and consider the one with the highest predictive performance as the best fit. However, model performance typically varies along factors such as sample size, target variable and predictor entropy, noise, missing values, etc. For example, a model’s resilience to noise and ability to deal with small sample size may yield better performance than the ground truth model for a given data set. Therefore, the best performer may not be the model that is most representative of the ground truth, but instead it may be the result of contextual factors that make this model outperform the ground truth one. We investigate the question of assessing different model fits using synthetic data by defining a vector space of model performances, and use a nearest neighbor approach with a correlation distance to identify the ground truth model. This approach is based on the following definitions and procedure. Consider a set of models,M, and a vector p of length jMj that contains the performance of each model over a given data set. This vector represents a point that characterizes the data set in the performance space. For each model M 2M, we determine a new point in the performance space that corresponds to synthetic data generated with model M. Then, for a given data set, we find the nearest synthetic data set point, using correlation as a distance, and consider the model behind it to be the ground truth. The results show that, for synthetic data sets, their underlying model sets are generally more often correctly identified with the proposed approach than by using the best performer approach. They also show that semantically similar models are also closer together in the performance space than the models that are based on highly different concepts

    Modeling Multiple Problem-Solving Strategies and Strategy Shift in Cognitive Diagnosis for Growth

    Get PDF
    Problem-solving strategies, defined as actions people select intentionally to achieve desired objectives, are distinguished from skills that are implemented unintentionally. In education, strategy-oriented instructions that guide students to form problem-solving strategies are found to be more effective for low-achievement students than the skill-oriented instructions designed for enhancing the skill implementation ability. However, conventional cognitive diagnosis models (CDMs) seldom distinguish the concept of skills from strategies. While the existing longitudinal CDMs can model students’ dynamic skill mastery status change over time, they did not intend to model the shift in students’ problem-solving strategies. Thus, it is hard to use conventional CDMs to identify students who need strategy-oriented instructions or evaluate the effectiveness of the education intervention programs that aim at training students’ problem-solving strategies. This study proposes a longitudinal CDM that takes into account both between-person multiple strategies and within-person strategy shift. The model, separating the strategy choice process from the skill implementation process, is intended to provide diagnostic information on strategy choice as well as skill mastery status. A simulation study is conducted to evaluate the parameter recovery of the proposed model and investigate the consequences of ignoring the presence of multiple strategies or strategy shift. Further, an empirical data analysis is conducted to demonstrate the use of the proposed model to measure strategy shift, growth in the skill implementation ability and skill mastery status
    corecore