2,395 research outputs found

    RiPLE: Recommendation in Peer-Learning Environments Based on Knowledge Gaps and Interests

    Full text link
    Various forms of Peer-Learning Environments are increasingly being used in post-secondary education, often to help build repositories of student generated learning objects. However, large classes can result in an extensive repository, which can make it more challenging for students to search for suitable objects that both reflect their interests and address their knowledge gaps. Recommender Systems for Technology Enhanced Learning (RecSysTEL) offer a potential solution to this problem by providing sophisticated filtering techniques to help students to find the resources that they need in a timely manner. Here, a new RecSysTEL for Recommendation in Peer-Learning Environments (RiPLE) is presented. The approach uses a collaborative filtering algorithm based upon matrix factorization to create personalized recommendations for individual students that address their interests and their current knowledge gaps. The approach is validated using both synthetic and real data sets. The results are promising, indicating RiPLE is able to provide sensible personalized recommendations for both regular and cold-start users under reasonable assumptions about parameters and user behavior.Comment: 25 pages, 7 figures. The paper is accepted for publication in the Journal of Educational Data Minin

    Investigating microstructural variation in the human hippocampus using non-negative matrix factorization

    No full text
    In this work we use non-negative matrix factorization to identify patterns of microstructural variance in the human hippocampus. We utilize high-resolution structural and diffusion magnetic resonance imaging data from the Human Connectome Project to query hippocampus microstructure on a multivariate, voxelwise basis. Application of non-negative matrix factorization identifies spatial components (clusters of voxels sharing similar covariance patterns), as well as subject weightings (individual variance across hippocampus microstructure). By assessing the stability of spatial components as well as the accuracy of factorization, we identified 4 distinct microstructural components. Furthermore, we quantified the benefit of using multiple microstructural metrics by demonstrating that using three microstructural metrics (T1-weighted/T2-weighted signal, mean diffusivity and fractional anisotropy) produced more stable spatial components than when assessing metrics individually. Finally, we related individual subject weightings to demographic and behavioural measures using a partial least squares analysis. Through this approach we identified interpretable relationships between hippocampus microstructure and demographic and behavioural measures. Taken together, our work suggests non-negative matrix factorization as a spatially specific analytical approach for neuroimaging studies and advocates for the use of multiple metrics for data-driven component analyses

    Economic Complexity Unfolded: Interpretable Model for the Productive Structure of Economies

    Full text link
    Economic complexity reflects the amount of knowledge that is embedded in the productive structure of an economy. It resides on the premise of hidden capabilities - fundamental endowments underlying the productive structure. In general, measuring the capabilities behind economic complexity directly is difficult, and indirect measures have been suggested which exploit the fact that the presence of the capabilities is expressed in a country's mix of products. We complement these studies by introducing a probabilistic framework which leverages Bayesian non-parametric techniques to extract the dominant features behind the comparative advantage in exported products. Based on economic evidence and trade data, we place a restricted Indian Buffet Process on the distribution of countries' capability endowment, appealing to a culinary metaphor to model the process of capability acquisition. The approach comes with a unique level of interpretability, as it produces a concise and economically plausible description of the instantiated capabilities

    Item to Skills Mapping: Deriving a Conjunctive Q-matrix from Data

    Full text link

    Empirical Means to Validate Skills Models and Assess the Fit of a Student Model

    Get PDF
    RÉSUMÉ Dans le domaine de l’analytique des données éducationnelles, ou dans le domaine de l’apprentissage automatique en général, un analyste qui souhaite construire un modèle de classification ou de régression avec un ensemble de données est confronté à un très grand nombre de choix. Les techniques d’apprentissage automatique offrent de nos jours la possibilité de créer des modèles d’une complexité toujours plus grande grâce à de nouvelles techniques d’apprentissage. Parallèlement à ces nouvelles possibilités vient la question abordée dans cette thèse : comment décider lesquels des modèles sont plus représentatifs de la réalité sous-jacente ? La pratique courante est de construire différents modèles et d’utiliser celui qui offre la meilleure prédiction comme le meilleur modèle. Toutefois, la performance du modèle varie généralement avec des facteurs tels que la taille de l’échantillon, la distribution de la variable ciblée, l’entropie des prédicteurs, le bruit, les valeurs manquantes, etc. Par exemple, la capacité d’adaptation d’un modèle au bruit et sa capacité à faire face à la petite taille de l’échantillon peut donner de meilleures performances que le modèle sous-jacent pour un ensemble de données. Par conséquent, le meilleur modèle peut ne pas être le plus représentatif de la réalité, mais peut être le résultat de facteurs contextuels qui rendent celui-ci meilleur que le modèle sous-jacent. Nous étudions la question de l’évaluation de modèles différents à partir de données synthétiques en définissant un espace vectoriel des performances de ceux-ci, et nous utilisons une l’approche du plus proches voisins avec une distance de corrélation pour identifier le modèle sous-jacent. Cette approche est basée sur les définitions et les procédures suivantes. Soit un ensemble de modèles, M, et un vecteur p de longueur jMj qui contient la performance de chaque modèle sur un ensemble de données. Ce vecteur représente un point qui caractérise l’ensemble de données dans l’espace de performance. Pour chaque modèle M dans M, nous déterminons un point pi dans l’espace de performance qui correspond à des données synthétiques générées par le modèle Mi. Puis, pour un ensemble de données, nous trouvons le point pi le plus proche, en utilisant la corrélation comme distance, et considérons le modèle Mi l’ayant généré comme le modèle sous-jacent. Les résultats montrent que, pour les ensembles de données synthétiques, leurs ensembles de modèles sous-jacents sont généralement plus souvent correctement identifiés par l’approche proposée plutôt que par le modèle avec la meilleure performance. Ils montrent aussi que les modèles sémantiquement similaires sont également plus rapprochés dans l’espace de performance que les modèles qui sont basés sur des concepts très différents.----------ABSTRACT In educational data mining, or in data mining in general, analysts that wish to build a classification or a regression model over new and unknown data are faced with a very wide span of choices. Machine learning techniques nowadays offer the possibility to learn and train a large and an ever growing variety of models from data. Along with this increased display of models that can be defined and trained from data, comes the question addressed in this thesis: how to decide which are the most representative of the underlying ground truth? The standard practice is to train different models, and consider the one with the highest predictive performance as the best fit. However, model performance typically varies along factors such as sample size, target variable and predictor entropy, noise, missing values, etc. For example, a model’s resilience to noise and ability to deal with small sample size may yield better performance than the ground truth model for a given data set. Therefore, the best performer may not be the model that is most representative of the ground truth, but instead it may be the result of contextual factors that make this model outperform the ground truth one. We investigate the question of assessing different model fits using synthetic data by defining a vector space of model performances, and use a nearest neighbor approach with a correlation distance to identify the ground truth model. This approach is based on the following definitions and procedure. Consider a set of models,M, and a vector p of length jMj that contains the performance of each model over a given data set. This vector represents a point that characterizes the data set in the performance space. For each model M 2M, we determine a new point in the performance space that corresponds to synthetic data generated with model M. Then, for a given data set, we find the nearest synthetic data set point, using correlation as a distance, and consider the model behind it to be the ground truth. The results show that, for synthetic data sets, their underlying model sets are generally more often correctly identified with the proposed approach than by using the best performer approach. They also show that semantically similar models are also closer together in the performance space than the models that are based on highly different concepts

    EDM 2011: 4th international conference on educational data mining : Eindhoven, July 6-8, 2011 : proceedings

    Get PDF

    Sparse modeling of test scores for estimating skills acquired by students

    Get PDF
    Thesis (Master of Science in Informatics)--University of Tsukuba, no. 41270, 2019.3.2

    FATREC Workshop on Responsible Recommendation Proceedings

    Get PDF
    We sought with this workshop, to foster a discussion of various topics that fall under the general umbrella of responsible recommendation: ethical considerations in recommendation, bias and discrimination in recommender systems, transparency and accountability, social impact of recommenders, user privacy, and other related concerns. Our goal was to encourage the community to think about how we build and study recommender systems in a socially-responsible manner. Recommendation systems are increasingly impacting people\u27s decisions in different walks of life including commerce, employment, dating, health, education and governance. As the impact and scope of recommendations increase, developing systems that tackle issues of fairness, transparency and accountability becomes important. This workshop was held in the spirit of FATML (Fairness, Accountability, and Transparency in Machine Learning), DAT (Data and Algorithmic Transparency), and similar workshops in related communities. With Responsible Recommendation , we brought that conversation to RecSys
    • …
    corecore