Search CORE

2,806,500 research outputs found

Optimal Data Split Methodology for Model Validation

Author: Bryant Corey
Miki Kenji
Morrison Rebecca
Prudhomme Serge
Terejanu Gabriel
Publication venue
Publication date: 01/01/2011
Field of study

The decision to incorporate cross-validation into validation processes of mathematical models raises an immediate question - how should one partition the data into calibration and validation sets? We answer this question systematically: we present an algorithm to find the optimal partition of the data subject to certain constraints. While doing this, we address two critical issues: 1) that the model be evaluated with respect to predictions of a given quantity of interest and its ability to reproduce the data, and 2) that the model be highly challenged by the validation set, assuming it is properly informed by the calibration set. This framework also relies on the interaction between the experimentalist and/or modeler, who understand the physical system and the limitations of the model; the decision-maker, who understands and can quantify the cost of model failure; and the computational scientists, who strive to determine if the model satisfies both the modeler's and decision maker's requirements. We also note that our framework is quite general, and may be applied to a wide range of problems. Here, we illustrate it through a specific example involving a data reduction model for an ICCD camera from a shock-tube experiment located at the NASA Ames Research Center (ARC).Comment: Submitted to International Conference on Modeling, Simulation and Control 2011 (ICMSC'11), San Francisco, USA, 19-21 October, 201

arXiv.org e-Print Archive

CiteSeerX

Software Engineering Laboratory: Data validation

Author: Chen E.
Zelkowwitz M. V.
Publication venue
Publication date
Field of study

NASA Technical Reports Server

Synthesizing Short-Circuiting Validation of Data Structure Invariants

Author: Chang Bor-Yuh Evan
Coughlin Devin
Rival Xavier
Tsai Yi-Fan
Publication venue
Publication date: 01/01/2015
Field of study

This paper presents incremental verification-validation, a novel approach for checking rich data structure invariants expressed as separation logic assertions. Incremental verification-validation combines static verification of separation properties with efficient, short-circuiting dynamic validation of arbitrarily rich data constraints. A data structure invariant checker is an inductive predicate in separation logic with an executable interpretation; a short-circuiting checker is an invariant checker that stops checking whenever it detects at run time that an assertion for some sub-structure has been fully proven statically. At a high level, our approach does two things: it statically proves the separation properties of data structure invariants using a static shape analysis in a standard way but then leverages this proof in a novel manner to synthesize short-circuiting dynamic validation of the data properties. As a consequence, we enable dynamic validation to make up for imprecision in sound static analysis while simultaneously leveraging the static verification to make the remaining dynamic validation efficient. We show empirically that short-circuiting can yield asymptotic improvements in dynamic validation, with low overhead over no validation, even in cases where static verification is incomplete

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Validation issues in educational data mining:the case of HTML-Tutor and iHelp

Author: Cocea Mihaela
Weibelzahl Stephan
Publication venue: CRC Press Inc
Publication date: 01/01/2010
Field of study

Validation is one of the key aspects in data mining and even more so in educational data mining (EDM) owing to the nature of the data. In this chapter, a brief overview of validation in the context of EDM is given and a case study is presented. The field of the case study is related to motivational issues, in general, and disengagement detection, in particular. There are several approaches to eliciting motivational knowledge from a learner’s activity trace; in this chapter the validation of such an approach is presented and discussed

TRAP

Portsmouth University Research Portal (Pure)

On Regularization Parameter Estimation under Covariate Shift

Author: Kouw Wouter M.
Loog Marco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/07/2016
Field of study

This paper identifies a problem with the usual procedure for L2-regularization parameter estimation in a domain adaptation setting. In such a setting, there are differences between the distributions generating the training data (source domain) and the test data (target domain). The usual cross-validation procedure requires validation data, which can not be obtained from the unlabeled target data. The problem is that if one decides to use source validation data, the regularization parameter is underestimated. One possible solution is to scale the source validation data through importance weighting, but we show that this correction is not sufficient. We conclude the paper with an empirical analysis of the effect of several importance weight estimators on the estimation of the regularization parameter.Comment: 6 pages, 2 figures, 2 tables. Accepted to ICPR 201

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Bayesian leave-one-out cross-validation for large data

Author: Andersen Michael Riis
Jonasson Johan
Magnusson Måns
Vehtari Aki
Publication venue
Publication date: 01/01/2019
Field of study

Model inference, such as model comparison, model checking, and model selection, is an important part of model development. Leave-one-out cross-validation (LOO) is a general approach for assessing the generalizability of a model, but unfortunately, LOO does not scale well to large datasets. We propose a combination of using approximate inference techniques and probability-proportional-to-size-sampling (PPS) for fast LOO model evaluation for large datasets. We provide both theoretical and empirical results showing good properties for large data.Comment: Accepted to ICML 2019. This version is the submitted pape

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Chalmers Research

Online Research Database In Technology

Added predictive value of high-throughput molecular data to clinical data, and its validation

Author: Boulesteix Anne-Laure
Sauerbrei Willi
Publication venue
Publication date: 01/01/2010
Field of study

Hundreds of ''molecular signatures'' have been proposed in the literature to predict patient outcome in clinical settings from high-dimensional data, many of which eventually failed to get validated. Validation of such molecular research findings is thus becoming an increasingly important branch of clinical bioinformatics. Moreover, in practice well-known clinical predictors are often already available. From a statistical and bioinformatics point of view, poor attention has been given to the evaluation of the added predictive value of a molecular signature given that clinical predictors are available. This article reviews procedures that assess and validate the added predictive value of high-dimensional molecular data. It critically surveys various approaches for the construction of combined prediction models using both clinical and molecular data, for validating added predictive value based on independent data, and for assessing added predictive value using a single data set

Open Access LMU ( Ludwig-Maximilians-Univ. München)