2 research outputs found

    Autocorrelation and linkage cause bias in evaluation of relational learners

    No full text
    Two common characteristics of relational data sets β€” concentrated linkage and relational auto-correlation β€” can cause traditional methods of evaluation to greatly overestimate the accuracy of induced models on test sets. We identify these characteristics, define quantitative measures of their severity, and explain how they produce this bias. We show how linkage and autocorrelation affect estimates of model accuracy by applying FOIL to synthetic data and to data drawn from the Internet Movie Database. We show how a modified sampling procedure can eliminate the bias
    corecore