63,022 research outputs found

    Calibration and improved prediction of computer models by universal Kriging

    Full text link
    This paper addresses the use of experimental data for calibrating a computer model and improving its predictions of the underlying physical system. A global statistical approach is proposed in which the bias between the computer model and the physical system is modeled as a realization of a Gaussian process. The application of classical statistical inference to this statistical model yields a rigorous method for calibrating the computer model and for adding to its predictions a statistical correction based on experimental data. This statistical correction can substantially improve the calibrated computer model for predicting the physical system on new experimental conditions. Furthermore, a quantification of the uncertainty of this prediction is provided. Physical expertise on the calibration parameters can also be taken into account in a Bayesian framework. Finally, the method is applied to the thermal-hydraulic code FLICA 4, in a single phase friction model framework. It allows to improve the predictions of the thermal-hydraulic code FLICA 4 significantly

    Potential of ALOS2 and NDVI to estimate forest above-ground biomass, and comparison with lidar-derived estimates

    Get PDF
    Remote sensing supports carbon estimation, allowing the upscaling of field measurements to large extents. Lidar is considered the premier instrument to estimate above ground biomass, but data are expensive and collected on-demand, with limited spatial and temporal coverage. The previous JERS and ALOS SAR satellites data were extensively employed to model forest biomass, with literature suggesting signal saturation at low-moderate biomass values, and an influence of plot size on estimates accuracy. The ALOS2 continuity mission since May 2014 produces data with improved features with respect to the former ALOS, such as increased spatial resolution and reduced revisit time. We used ALOS2 backscatter data, testing also the integration with additional features (SAR textures and NDVI from Landsat 8 data) together with ground truth, to model and map above ground biomass in two mixed forest sites: Tahoe (California) and Asiago (Alps). While texture was useful to improve the model performance, the best model was obtained using joined SAR and NDVI (R2 equal to 0.66). In this model, only a slight saturation was observed, at higher levels than what usually reported in literature for SAR; the trend requires further investigation but the model confirmed the complementarity of optical and SAR datatypes. For comparison purposes, we also generated a biomass map for Asiago using lidar data, and considered a previous lidar-based study for Tahoe; in these areas, the observed R2 were 0.92 for Tahoe and 0.75 for Asiago, respectively. The quantitative comparison of the carbon stocks obtained with the two methods allows discussion of sensor suitability. The range of local variation captured by lidar is higher than those by SAR and NDVI, with the latter showing overestimation. However, this overestimation is very limited for one of the study areas, suggesting that when the purpose is the overall quantification of the stored carbon, especially in areas with high carbon density, satellite data with lower cost and broad coverage can be as effective as lidar

    High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression

    Get PDF
    Motivation: The high dimensionality of genomic data calls for the development of specific classification methodologies, especially to prevent over-optimistic predictions. This challenge can be tackled by compression and variable selection, which combined constitute a powerful framework for classification, as well as data visualization and interpretation. However, current proposed combinations lead to instable and non convergent methods due to inappropriate computational frameworks. We hereby propose a stable and convergent approach for classification in high dimensional based on sparse Partial Least Squares (sparse PLS). Results: We start by proposing a new solution for the sparse PLS problem that is based on proximal operators for the case of univariate responses. Then we develop an adaptive version of the sparse PLS for classification, which combines iterative optimization of logistic regression and sparse PLS to ensure convergence and stability. Our results are confirmed on synthetic and experimental data. In particular we show how crucial convergence and stability can be when cross-validation is involved for calibration purposes. Using gene expression data we explore the prediction of breast cancer relapse. We also propose a multicategorial version of our method on the prediction of cell-types based on single-cell expression data. Availability: Our approach is implemented in the plsgenomics R-package.Comment: 9 pages, 3 figures, 4 tables + Supplementary Materials 8 pages, 3 figures, 10 table

    SNPredict: A Machine Learning Approach for Detecting Low Frequency Variants in Cancer

    Get PDF
    Cancer is a genetic disease caused by the accumulation of DNA variants such as single nucleotide changes or insertions/deletions in DNA. DNA variants can cause silencing of tumor suppressor genes or increase the activity of oncogenes. In order to come up with successful therapies for cancer patients, these DNA variants need to be identified accurately. DNA variants can be identified by comparing DNA sequence of tumor tissue to a non-tumor tissue by using Next Generation Sequencing (NGS) technology. But the problem of detecting variants in cancer is hard because many of these variant occurs only in a small subpopulation of the tumor tissue. It becomes a challenge to distinguish these low frequency variants from sequencing errors, which are common in today\u27s NGS methods. Several algorithms have been made and implemented as a tool to identify such variants in cancer. However, it has been previously shown that there is low concordance in the results produced by these tools. Moreover, the number of false positives tend to significantly increase when these tools are faced with low frequency variants. This study presents SNPredict, a single nucleotide polymorphism (SNP) detection pipeline that aims to utilize the results of multiple variant callers to produce a consensus output with higher accuracy than any of the individual tool with the help of machine learning techniques. By extracting features from the consensus output that describe traits associated with an individual variant call, it creates binary classifiers that predict a SNP’s true state and therefore help in distinguishing a sequencing error from a true variant
    • …
    corecore