1,517,670 research outputs found

    Systems validation: application to statistical programs

    Get PDF
    BACKGROUND: In 2003, the United States Food and Drug Administration (FDA) released a guidance document on the scope of "Part 11" enforcement. In this guidance document, the FDA indicates an expectation of a risk-based approach to determining which systems should undergo validation. Since statistical programs manage and manipulate raw data, their implementation should be critically reviewed to determine whether or not they should undergo validation. However, the concepts of validation are not often discussed in biostatistics curriculum. DISCUSSION: This paper summarizes a "Plan, Do, Say" approach to validation that can be incorporated into statistical training so that biostatisticians can understand and implement validation principles in their research. SUMMARY: Validation is a process that requires dedicated attention. The process of validation can be easily understood in the context of the scientific method

    Multiplicative local linear hazard estimation and best one-sided cross-validation

    Get PDF
    This paper develops detailed mathematical statistical theory of a new class of cross-validation techniques of local linear kernel hazards and their multiplicative bias corrections. The new class of cross-validation combines principles of local information and recent advances in indirect cross-validation. A few applications of cross-validating multiplicative kernel hazard estimation do exist in the literature. However, detailed mathematical statistical theory and small sample performance are introduced via this paper and further upgraded to our new class of best one-sided cross-validation. Best one-sided cross-validation turns out to have excellent performance in its practical illustrations, in its small sample performance and in its mathematical statistical theoretical performance

    Validation of Geant4-based Radioactive Decay Simulation

    Full text link
    Radioactive decays are of concern in a wide variety of applications using Monte-Carlo simulations. In order to properly estimate the quality of such simulations, knowledge of the accuracy of the decay simulation is required. We present a validation of the original Geant4 Radioactive Decay Module, which uses a per-decay sampling approach, and of an extended package for Geant4-based simulation of radioactive decays, which, in addition to being able to use a refactored per-decay sampling, is capable of using a statistical sampling approach. The validation is based on measurements of calibration isotope sources using a high purity Germanium (HPGe) detector; no calibration of the simulation is performed. For the considered validation experiment equivalent simulation accuracy can be achieved with per-decay and statistical sampling

    Multilayer Aggregation with Statistical Validation: Application to Investor Networks

    Get PDF
    Multilayer networks are attracting growing attention in many fields, including finance. In this paper, we develop a new tractable procedure for multilayer aggregation based on statistical validation, which we apply to investor networks. Moreover, we propose two other improvements to their analysis: transaction bootstrapping and investor categorization. The aggregation procedure can be used to integrate security-wise and time-wise information about investor trading networks, but it is not limited to finance. In fact, it can be used for different applications, such as gene, transportation, and social networks, were they inferred or observable. Additionally, in the investor network inference, we use transaction bootstrapping for better statistical validation. Investor categorization allows for constant size networks and having more observations for each node, which is important in the inference especially for less liquid securities. Furthermore, we observe that the window size used for averaging has a substantial effect on the number of inferred relationships. We apply this procedure by analyzing a unique data set of Finnish shareholders during the period 2004-2009. We find that households in the capital have high centrality in investor networks, which, under the theory of information channels in investor networks suggests that they are well-informed investors

    Reducing the Probability of False Positive Research Findings by Pre-Publication Validation - Experience with a Large Multiple Sclerosis Database

    Get PDF
    *Objective*
We have assessed the utility of a pre-publication validation policy in reducing the probability of publishing false positive research findings. 
*Study design and setting*
The large database of the Sylvia Lawry Centre for Multiple Sclerosis Research was split in two parts: one for hypothesis generation and a validation part for confirmation of selected results. We present case studies from 5 finalized projects that have used the validation policy and results from a simulation study.
*Results*
In one project, the "relapse and disability" project as described in section II (example 3), findings could not be confirmed in the validation part of the database. The simulation study showed that the percentage of false positive findings can exceed 20% depending on variable selection. 
*Conclusion*
We conclude that the validation policy has prevented the publication of at least one research finding that could not be validated in an independent data set (and probably would have been a "true" false-positive finding) over the past three years, and has led to improved data analysis, statistical programming, and selection of hypotheses. The advantages outweigh the lost statistical power inherent in the process

    Model selection in neural networks

    Get PDF
    In this article we examine how model selection in neural networks can be guided by statistical procedures such as hypotheses tests, information criteria and cross validation. The application of these methods in neural network models is discussed, paying attention especially to the identification problems encountered. We then propose five specification strategies based on different statistical procedures and compare them in a simulation study. As the results of the study are promising, it is suggested that a statistical analysis should become an integral part of neural network modelling. --Neural Networks,Statistical Inference,Model Selection,Identification,Information Criteria,Cross Validation

    Statistical validation of simulation models: A case study

    Get PDF
    Rigorous statistical validation requires that the responses of the model and the real system have the same expected values. However, the modeled and actual responses are not comparable if they are obtained under different scenarios (environmental conditions). Moreover, data on the real system may be unavailable; sensitivity analysis can then be applied to find out whether the model inputs have effects on the model outputs that agree with the experts' intuition. Not only the total model, but also its modules may be submitted to such sensitivity analyses. This article illustrates these issues through a case study, namely a simulation model for the use of sonar to search for mines on the sea bottom. The methodology, however, applies to models in general.Simulation Models;Statistical Validation;statistics

    Fast Cross-Validation via Sequential Testing

    Full text link
    With the increasing size of today's data sets, finding the right parameter configuration in model selection via cross-validation can be an extremely time-consuming task. In this paper we propose an improved cross-validation procedure which uses nonparametric testing coupled with sequential analysis to determine the best parameter set on linearly increasing subsets of the data. By eliminating underperforming candidates quickly and keeping promising candidates as long as possible, the method speeds up the computation while preserving the capability of the full cross-validation. Theoretical considerations underline the statistical power of our procedure. The experimental evaluation shows that our method reduces the computation time by a factor of up to 120 compared to a full cross-validation with a negligible impact on the accuracy
    corecore