Search CORE

2,395 research outputs found

Mapping question items to skills with non-negative matrix factorization

Author: Ayers E.
Barnes T.
Desmarais M. C.
Desmarais M. C.
Michel C. Desmarais
Pardos Z. A.
Winters T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

RiPLE: Recommendation in Peer-Learning Environments Based on Knowledge Gaps and Interests

Author: Cooper Kendra
Khosravi Hassan
Kitto Kirsty
Publication venue
Publication date: 01/01/2017
Field of study

Various forms of Peer-Learning Environments are increasingly being used in post-secondary education, often to help build repositories of student generated learning objects. However, large classes can result in an extensive repository, which can make it more challenging for students to search for suitable objects that both reflect their interests and address their knowledge gaps. Recommender Systems for Technology Enhanced Learning (RecSysTEL) offer a potential solution to this problem by providing sophisticated filtering techniques to help students to find the resources that they need in a timely manner. Here, a new RecSysTEL for Recommendation in Peer-Learning Environments (RiPLE) is presented. The approach uses a collaborative filtering algorithm based upon matrix factorization to create personalized recommendations for individual students that address their interests and their current knowledge gaps. The approach is validated using both synthetic and real data sets. The results are promising, indicating RiPLE is able to provide sensible personalized recommendations for both regular and cold-start users under reasonable assumptions about parameters and user behavior.Comment: 25 pages, 7 figures. The paper is accepted for publication in the Journal of Educational Data Minin

arXiv.org e-Print Archive

University of Queensland eSpace

Investigating microstructural variation in the human hippocampus using non-negative matrix factorization

Author: Chakravarty M.
Chen A.
Devenyi G.
Germann J.
Patel R.
Patel S.
Steele C.
Tardif C.
Publication venue: 'Elsevier BV'
Publication date: 09/11/2019
Field of study

In this work we use non-negative matrix factorization to identify patterns of microstructural variance in the human hippocampus. We utilize high-resolution structural and diffusion magnetic resonance imaging data from the Human Connectome Project to query hippocampus microstructure on a multivariate, voxelwise basis. Application of non-negative matrix factorization identifies spatial components (clusters of voxels sharing similar covariance patterns), as well as subject weightings (individual variance across hippocampus microstructure). By assessing the stability of spatial components as well as the accuracy of factorization, we identified 4 distinct microstructural components. Furthermore, we quantified the benefit of using multiple microstructural metrics by demonstrating that using three microstructural metrics (T1-weighted/T2-weighted signal, mean diffusivity and fractional anisotropy) produced more stable spatial components than when assessing metrics individually. Finally, we related individual subject weightings to demographic and behavioural measures using a partial least squares analysis. Through this approach we identified interpretable relationships between hippocampus microstructure and demographic and behavioural measures. Taken together, our work suggests non-negative matrix factorization as a spatially specific analytical approach for neuroimaging studies and advocates for the use of multiple metrics for data-driven component analyses

MPG.PuRe

Economic Complexity Unfolded: Interpretable Model for the Productive Structure of Economies

Author: Kocarev Ljupco
Perez-Cruz Fernando
Pradier Melanie F.
Stojkoski Viktor
Utkovski Zoran
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Economic complexity reflects the amount of knowledge that is embedded in the productive structure of an economy. It resides on the premise of hidden capabilities - fundamental endowments underlying the productive structure. In general, measuring the capabilities behind economic complexity directly is difficult, and indirect measures have been suggested which exploit the fact that the presence of the capabilities is expressed in a country's mix of products. We complement these studies by introducing a probabilistic framework which leverages Bayesian non-parametric techniques to extract the dominant features behind the comparative advantage in exported products. Based on economic evidence and trade data, we place a restricted Indian Buffet Process on the distribution of countries' capability endowment, appealing to a culinary metaphor to model the process of capability acquisition. The approach comes with a unique level of interpretability, as it produces a concise and economically plausible description of the instantiated capabilities

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Repository for Publications and Research Data

Fraunhofer-ePrints

Directory of Open Access Journals

FigShare

Item to Skills Mapping: Deriving a Conjunctive Q-matrix from Data

Author: D.D. Lee
K.K. Tatsuoka
K.R. Koedinger
M. Feng
M.W. Berry
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

PolyPublie

Empirical Means to Validate Skills Models and Assess the Fit of a Student Model

Author: Beheshti Behzad
Publication venue
Publication date: 01/04/2016
Field of study

RÉSUMÉ Dans le domaine de l’analytique des données éducationnelles, ou dans le domaine de l’apprentissage automatique en général, un analyste qui souhaite construire un modèle de classification ou de régression avec un ensemble de données est confronté à un très grand nombre de choix. Les techniques d’apprentissage automatique offrent de nos jours la possibilité de créer des modèles d’une complexité toujours plus grande grâce à de nouvelles techniques d’apprentissage. Parallèlement à ces nouvelles possibilités vient la question abordée dans cette thèse : comment décider lesquels des modèles sont plus représentatifs de la réalité sous-jacente ? La pratique courante est de construire différents modèles et d’utiliser celui qui offre la meilleure prédiction comme le meilleur modèle. Toutefois, la performance du modèle varie généralement avec des facteurs tels que la taille de l’échantillon, la distribution de la variable ciblée, l’entropie des prédicteurs, le bruit, les valeurs manquantes, etc. Par exemple, la capacité d’adaptation d’un modèle au bruit et sa capacité à faire face à la petite taille de l’échantillon peut donner de meilleures performances que le modèle sous-jacent pour un ensemble de données. Par conséquent, le meilleur modèle peut ne pas être le plus représentatif de la réalité, mais peut être le résultat de facteurs contextuels qui rendent celui-ci meilleur que le modèle sous-jacent. Nous étudions la question de l’évaluation de modèles différents à partir de données synthétiques en définissant un espace vectoriel des performances de ceux-ci, et nous utilisons une l’approche du plus proches voisins avec une distance de corrélation pour identifier le modèle sous-jacent. Cette approche est basée sur les définitions et les procédures suivantes. Soit un ensemble de modèles, M, et un vecteur p de longueur jMj qui contient la performance de chaque modèle sur un ensemble de données. Ce vecteur représente un point qui caractérise l’ensemble de données dans l’espace de performance. Pour chaque modèle M dans M, nous déterminons un point pi dans l’espace de performance qui correspond à des données synthétiques générées par le modèle Mi. Puis, pour un ensemble de données, nous trouvons le point pi le plus proche, en utilisant la corrélation comme distance, et considérons le modèle Mi l’ayant généré comme le modèle sous-jacent. Les résultats montrent que, pour les ensembles de données synthétiques, leurs ensembles de modèles sous-jacents sont généralement plus souvent correctement identifiés par l’approche proposée plutôt que par le modèle avec la meilleure performance. Ils montrent aussi que les modèles sémantiquement similaires sont également plus rapprochés dans l’espace de performance que les modèles qui sont basés sur des concepts très différents.----------ABSTRACT In educational data mining, or in data mining in general, analysts that wish to build a classification or a regression model over new and unknown data are faced with a very wide span of choices. Machine learning techniques nowadays offer the possibility to learn and train a large and an ever growing variety of models from data. Along with this increased display of models that can be defined and trained from data, comes the question addressed in this thesis: how to decide which are the most representative of the underlying ground truth? The standard practice is to train different models, and consider the one with the highest predictive performance as the best fit. However, model performance typically varies along factors such as sample size, target variable and predictor entropy, noise, missing values, etc. For example, a model’s resilience to noise and ability to deal with small sample size may yield better performance than the ground truth model for a given data set. Therefore, the best performer may not be the model that is most representative of the ground truth, but instead it may be the result of contextual factors that make this model outperform the ground truth one. We investigate the question of assessing different model fits using synthetic data by defining a vector space of model performances, and use a nearest neighbor approach with a correlation distance to identify the ground truth model. This approach is based on the following definitions and procedure. Consider a set of models,M, and a vector p of length jMj that contains the performance of each model over a given data set. This vector represents a point that characterizes the data set in the performance space. For each model M 2M, we determine a new point in the performance space that corresponds to synthetic data generated with model M. Then, for a given data set, we find the nearest synthetic data set point, using correlation as a distance, and consider the model behind it to be the ground truth. The results show that, for synthetic data sets, their underlying model sets are generally more often correctly identified with the proposed approach than by using the best performer approach. They also show that semantically similar models are also closer together in the performance space than the models that are based on highly different concepts

PolyPublie

EDM 2011: 4th international conference on educational data mining : Eindhoven, July 6-8, 2011 : proceedings

Author
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2011
Field of study

Pure OAI Repository

Sparse modeling of test scores for estimating skills acquired by students

Author: Shohei KIKUCHI
菊池祥平
Publication venue
Publication date: 01/01/2019
Field of study

Thesis (Master of Science in Informatics)--University of Tsukuba, no. 41270, 2019.3.2

Tsukuba Repository

FATREC Workshop on Responsible Recommendation Proceedings

Author: Ekstrand Michael
Sharma Amit
Publication venue: 'IUScholarWorks'
Publication date: 16/08/2017
Field of study

We sought with this workshop, to foster a discussion of various topics that fall under the general umbrella of responsible recommendation: ethical considerations in recommendation, bias and discrimination in recommender systems, transparency and accountability, social impact of recommenders, user privacy, and other related concerns. Our goal was to encourage the community to think about how we build and study recommender systems in a socially-responsible manner. Recommendation systems are increasingly impacting people\u27s decisions in different walks of life including commerce, employment, dating, health, education and governance. As the impact and scope of recommendations increase, developing systems that tackle issues of fairness, transparency and accountability becomes important. This workshop was held in the spirit of FATML (Fairness, Accountability, and Transparency in Machine Learning), DAT (Data and Algorithmic Transparency), and similar workshops in related communities. With Responsible Recommendation , we brought that conversation to RecSys

Boise State University - ScholarWorks