2,806,500 research outputs found
Optimal Data Split Methodology for Model Validation
The decision to incorporate cross-validation into validation processes of
mathematical models raises an immediate question - how should one partition the
data into calibration and validation sets? We answer this question
systematically: we present an algorithm to find the optimal partition of the
data subject to certain constraints. While doing this, we address two critical
issues: 1) that the model be evaluated with respect to predictions of a given
quantity of interest and its ability to reproduce the data, and 2) that the
model be highly challenged by the validation set, assuming it is properly
informed by the calibration set. This framework also relies on the interaction
between the experimentalist and/or modeler, who understand the physical system
and the limitations of the model; the decision-maker, who understands and can
quantify the cost of model failure; and the computational scientists, who
strive to determine if the model satisfies both the modeler's and decision
maker's requirements. We also note that our framework is quite general, and may
be applied to a wide range of problems. Here, we illustrate it through a
specific example involving a data reduction model for an ICCD camera from a
shock-tube experiment located at the NASA Ames Research Center (ARC).Comment: Submitted to International Conference on Modeling, Simulation and
Control 2011 (ICMSC'11), San Francisco, USA, 19-21 October, 201
Synthesizing Short-Circuiting Validation of Data Structure Invariants
This paper presents incremental verification-validation, a novel approach for
checking rich data structure invariants expressed as separation logic
assertions. Incremental verification-validation combines static verification of
separation properties with efficient, short-circuiting dynamic validation of
arbitrarily rich data constraints. A data structure invariant checker is an
inductive predicate in separation logic with an executable interpretation; a
short-circuiting checker is an invariant checker that stops checking whenever
it detects at run time that an assertion for some sub-structure has been fully
proven statically. At a high level, our approach does two things: it statically
proves the separation properties of data structure invariants using a static
shape analysis in a standard way but then leverages this proof in a novel
manner to synthesize short-circuiting dynamic validation of the data
properties. As a consequence, we enable dynamic validation to make up for
imprecision in sound static analysis while simultaneously leveraging the static
verification to make the remaining dynamic validation efficient. We show
empirically that short-circuiting can yield asymptotic improvements in dynamic
validation, with low overhead over no validation, even in cases where static
verification is incomplete
Validation issues in educational data mining:the case of HTML-Tutor and iHelp
Validation is one of the key aspects in data mining and even more so in educational data mining (EDM) owing to the nature of the data. In this chapter, a brief overview of validation in the context of EDM is given and a case study is presented. The field of the case study is related to motivational issues, in general, and disengagement detection, in particular. There are several approaches to eliciting motivational knowledge from a learner’s activity trace; in this chapter the validation of such an approach is presented and discussed
On Regularization Parameter Estimation under Covariate Shift
This paper identifies a problem with the usual procedure for
L2-regularization parameter estimation in a domain adaptation setting. In such
a setting, there are differences between the distributions generating the
training data (source domain) and the test data (target domain). The usual
cross-validation procedure requires validation data, which can not be obtained
from the unlabeled target data. The problem is that if one decides to use
source validation data, the regularization parameter is underestimated. One
possible solution is to scale the source validation data through importance
weighting, but we show that this correction is not sufficient. We conclude the
paper with an empirical analysis of the effect of several importance weight
estimators on the estimation of the regularization parameter.Comment: 6 pages, 2 figures, 2 tables. Accepted to ICPR 201
Bayesian leave-one-out cross-validation for large data
Model inference, such as model comparison, model checking, and model
selection, is an important part of model development. Leave-one-out
cross-validation (LOO) is a general approach for assessing the generalizability
of a model, but unfortunately, LOO does not scale well to large datasets. We
propose a combination of using approximate inference techniques and
probability-proportional-to-size-sampling (PPS) for fast LOO model evaluation
for large datasets. We provide both theoretical and empirical results showing
good properties for large data.Comment: Accepted to ICML 2019. This version is the submitted pape
Added predictive value of high-throughput molecular data to clinical data, and its validation
Hundreds of ''molecular signatures'' have been proposed in the literature to predict patient outcome in clinical settings from high-dimensional data, many of which eventually failed to get validated. Validation of such molecular research findings is thus becoming an increasingly important branch of clinical bioinformatics. Moreover, in practice well-known clinical predictors are often already available. From a statistical and bioinformatics point of view, poor attention has been given to the evaluation of the added predictive value of a molecular signature given that clinical predictors are available. This article reviews procedures that assess and validate the added predictive value of high-dimensional molecular data. It critically surveys various approaches for the construction of combined prediction models using both clinical and molecular data, for validating added predictive value based on independent data, and for assessing added predictive value using a single data set
- …
