37,888 research outputs found

    Essential guidelines for computational method benchmarking

    Get PDF
    In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.Comment: Minor update

    A posteriori agreement as a quality measure for readability prediction systems

    Get PDF
    All readability research is ultimately concerned with the research question whether it is possible for a prediction system to automatically determine the level of readability of an unseen text. A significant problem for such a system is that readability might depend in part on the reader. If different readers assess the readability of texts in fundamentally different ways, there is insufficient a priori agreement to justify the correctness of a readability prediction system based on the texts assessed by those readers. We built a data set of readability assessments by expert readers. We clustered the experts into groups with greater a priori agreement and then measured for each group whether classifiers trained only on data from this group exhibited a classification bias. As this was found to be the case, the classification mechanism cannot be unproblematically generalized to a different user group

    Consensus clustering and functional interpretation of gene-expression data

    Get PDF
    Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFκB and the unfolded protein response in certain B-cell lymphomas

    Quantification of Uncertainties in Inline Inspection Data for Metal-loss Corrosion on Energy Pipelines and Implications for Reliability Analysis

    Get PDF
    One of the major threats to the oil and gas transmission pipeline integrity is metal-loss corrosion. Pipeline operators periodically inspect the size of the metal loss corrosion in a pipeline using in-line inspection (ILI) tools to avoid pipe failure which may lead to severe consequences. To predict pipe failure efficiently, reliability-based corrosion management program is gaining popularity as it effectively incorporates all the uncertainties involved in the pipe failure prediction. The focus of the research reported in this thesis is to investigate the unaddressed issues in the reliability-based corrosion assessment to assist in better predicting pipe failure. First, a methodology is proposed to facilitate the use of RSTRENG (Remaining Strength of Corroded Pipe) and CSA (Canadian standards association) burst pressure capacity models in reliability-based failure prediction of pipelines. Use of RSTRENG and CSA models require the detail geometric information of a corrosion defect, which may not be available in the ILI reports. To facilitate the use of CSA and RSTRENG models in the reliability analysis, probabilistic characteristics of parameters that relate the detailed defect geometry to its simplified characterizing parameters was derived by using the high-resolution geometric data for a large set of external metal-loss corrosion defects identified on an in-service pipeline in Alberta, Canada. Next, a complete framework is proposed to quantify the measurement error associated with the ILI measured corrosion defect length, effective length, and effective depth of oil and gas pipelines. A relatively large set of ILI-reported and field-measured defect data is collected from different in-service pipelines in Canada and used to develop the measurement error models. The proposed measurement error models associated with the ILI reported corrosion defect length, effective length, and effective depth is the weighted average of the measurement errors of the corresponding Type I and Type II defects and the weighted factor is the likelihood of ILI reported corrosion defect being a Type I defect (without cluster error) or a Type II defect (with clustering error). A log-logistic model is proposed to quantify the weighted factor. The application of the proposed measurement error models is demonstrated by evaluating probability of failure of a real corroded pipe joint through system reliability analysis
    • …
    corecore