2 research outputs found
Joint Topic Modeling and Factor Analysis of Textual Information and Graded Response Data
Modern machine learning methods are critical to the development of
large-scale personalized learning systems that cater directly to the needs of
individual learners. The recently developed SPARse Factor Analysis (SPARFA)
framework provides a new statistical model and algorithms for machine
learning-based learning analytics, which estimate a learner's knowledge of the
latent concepts underlying a domain, and content analytics, which estimate the
relationships among a collection of questions and the latent concepts. SPARFA
estimates these quantities given only the binary-valued graded responses to a
collection of questions. In order to better interpret the estimated latent
concepts, SPARFA relies on a post-processing step that utilizes user-defined
tags (e.g., topics or keywords) available for each question. In this paper, we
relax the need for user-defined tags by extending SPARFA to jointly process
both graded learner responses and the text of each question and its associated
answer(s) or other feedback. Our purely data-driven approach (i) enhances the
interpretability of the estimated latent concepts without the need of
explicitly generating a set of tags or performing a post-processing step, (ii)
improves the prediction performance of SPARFA, and (iii) scales to large
test/assessments where human annotation would prove burdensome. We demonstrate
the efficacy of the proposed approach on two real educational datasets
Joint Analysis of Time-Evolving Binary Matrices and Associated Documents 1
We consider problems for which one has incomplete binary matrices that evolve with time (e.g., the votes of legislators on particular legislation, with each year characterized by a different such matrix). An objective of such analysis is to infer structure and inter-relationships underlying the matrices, here defined by latent features associated with each axis of the matrix. In addition, it is assumed that documents are available for the entities associated with at least one of the matrix axes. By jointly analyzing the matrices and documents, one may be used to inform the other within the analysis, and the model offers the opportunity to predict matrix values (e.g., votes) based only on an associated document (e.g., legislation). The research presented here merges two areas of machine-learning that have previously been investigated separately: incomplete-matrix analysis and topic modeling. The analysis is performed from a Bayesian perspective, with efficient inference constituted via Gibbs sampling. The framework is demonstrated by considering all voting data and available documents (legislation) during the 220-year lifetime of the United States Senate and House of Representatives.