12,886 research outputs found
Dropout Model Evaluation in MOOCs
The field of learning analytics needs to adopt a more rigorous approach for
predictive model evaluation that matches the complex practice of
model-building. In this work, we present a procedure to statistically test
hypotheses about model performance which goes beyond the state-of-the-practice
in the community to analyze both algorithms and feature extraction methods from
raw data. We apply this method to a series of algorithms and feature sets
derived from a large sample of Massive Open Online Courses (MOOCs). While a
complete comparison of all potential modeling approaches is beyond the scope of
this paper, we show that this approach reveals a large gap in dropout
prediction performance between forum-, assignment-, and clickstream-based
feature extraction methods, where the latter is significantly better than the
former two, which are in turn indistinguishable from one another. This work has
methodological implications for evaluating predictive or AI-based models of
student success, and practical implications for the design and targeting of
at-risk student models and interventions
Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost
We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models to potentially high-dimensional data. We give a theoretical background and demonstrate how mboost can be used to fit interpretable models of different complexity. As an example we use mboost to predict the body fat based on anthropometric measurements throughout the tutorial
A Unified Framework of Constrained Regression
Generalized additive models (GAMs) play an important role in modeling and
understanding complex relationships in modern applied statistics. They allow
for flexible, data-driven estimation of covariate effects. Yet researchers
often have a priori knowledge of certain effects, which might be monotonic or
periodic (cyclic) or should fulfill boundary conditions. We propose a unified
framework to incorporate these constraints for both univariate and bivariate
effect estimates and for varying coefficients. As the framework is based on
component-wise boosting methods, variables can be selected intrinsically, and
effects can be estimated for a wide range of different distributional
assumptions. Bootstrap confidence intervals for the effect estimates are
derived to assess the models. We present three case studies from environmental
sciences to illustrate the proposed seamless modeling framework. All discussed
constrained effect estimates are implemented in the comprehensive R package
mboost for model-based boosting.Comment: This is a preliminary version of the manuscript. The final
publication is available at
http://link.springer.com/article/10.1007/s11222-014-9520-
- …