2,835,396 research outputs found

    Optimal Data Split Methodology for Model Validation

    Full text link
    The decision to incorporate cross-validation into validation processes of mathematical models raises an immediate question - how should one partition the data into calibration and validation sets? We answer this question systematically: we present an algorithm to find the optimal partition of the data subject to certain constraints. While doing this, we address two critical issues: 1) that the model be evaluated with respect to predictions of a given quantity of interest and its ability to reproduce the data, and 2) that the model be highly challenged by the validation set, assuming it is properly informed by the calibration set. This framework also relies on the interaction between the experimentalist and/or modeler, who understand the physical system and the limitations of the model; the decision-maker, who understands and can quantify the cost of model failure; and the computational scientists, who strive to determine if the model satisfies both the modeler's and decision maker's requirements. We also note that our framework is quite general, and may be applied to a wide range of problems. Here, we illustrate it through a specific example involving a data reduction model for an ICCD camera from a shock-tube experiment located at the NASA Ames Research Center (ARC).Comment: Submitted to International Conference on Modeling, Simulation and Control 2011 (ICMSC'11), San Francisco, USA, 19-21 October, 201

    Synthesizing Short-Circuiting Validation of Data Structure Invariants

    Full text link
    This paper presents incremental verification-validation, a novel approach for checking rich data structure invariants expressed as separation logic assertions. Incremental verification-validation combines static verification of separation properties with efficient, short-circuiting dynamic validation of arbitrarily rich data constraints. A data structure invariant checker is an inductive predicate in separation logic with an executable interpretation; a short-circuiting checker is an invariant checker that stops checking whenever it detects at run time that an assertion for some sub-structure has been fully proven statically. At a high level, our approach does two things: it statically proves the separation properties of data structure invariants using a static shape analysis in a standard way but then leverages this proof in a novel manner to synthesize short-circuiting dynamic validation of the data properties. As a consequence, we enable dynamic validation to make up for imprecision in sound static analysis while simultaneously leveraging the static verification to make the remaining dynamic validation efficient. We show empirically that short-circuiting can yield asymptotic improvements in dynamic validation, with low overhead over no validation, even in cases where static verification is incomplete

    Geant4 validation with CMS calorimeters test-beam data

    Full text link
    CMS experiment is using Geant4 for Monte-Carlo simulation of the detector setup. Validation of physics processes describing hadronic showers is a major concern in view of getting a proper description of jets and missing energy for signal and background events. This is done by carrying out an extensive studies with test beam using the prototypes or real detector modules of the CMS calorimeter. These data are matched with Geant4 predictions. Tuning of the Geant4 models is carried out and steps to be used in reproducing detector signals are defined in view of measurements of energy response, energy resolution, transverse and longitudinal shower profiles for a variety of hadron beams over a broad energy spectrum between 2 to 300 GeV/c.Comment: Poster presented at the Hadron Collider Physics Symposium (HCP2008), Galena, Illinois, USA, May 27-31, 2008; 5 pages, LaTeX, 28 eps figure

    On Regularization Parameter Estimation under Covariate Shift

    Full text link
    This paper identifies a problem with the usual procedure for L2-regularization parameter estimation in a domain adaptation setting. In such a setting, there are differences between the distributions generating the training data (source domain) and the test data (target domain). The usual cross-validation procedure requires validation data, which can not be obtained from the unlabeled target data. The problem is that if one decides to use source validation data, the regularization parameter is underestimated. One possible solution is to scale the source validation data through importance weighting, but we show that this correction is not sufficient. We conclude the paper with an empirical analysis of the effect of several importance weight estimators on the estimation of the regularization parameter.Comment: 6 pages, 2 figures, 2 tables. Accepted to ICPR 201

    Validation issues in educational data mining:the case of HTML-Tutor and iHelp

    Get PDF
    Validation is one of the key aspects in data mining and even more so in educational data mining (EDM) owing to the nature of the data. In this chapter, a brief overview of validation in the context of EDM is given and a case study is presented. The field of the case study is related to motivational issues, in general, and disengagement detection, in particular. There are several approaches to eliciting motivational knowledge from a learner’s activity trace; in this chapter the validation of such an approach is presented and discussed

    HepData and JetWeb: HEP data archiving and model validation

    Get PDF
    The CEDAR collaboration is extending and combining the JetWeb and HepData systems to provide a single service for tuning and validating models of high-energy physics processes. The centrepiece of this activity is the fitting by JetWeb of observables computed from Monte Carlo event generator events against their experimentally determined distributions, as stored in HepData. Caching the results of the JetWeb simulation and comparison stages provides a single cumulative database of event generator tunings, fitted against a wide range of experimental quantities. An important feature of this integration is a family of XML data formats, called HepML.Comment: 4 pages, 0 figures. To be published in proceedings of CHEP0
    corecore