7 research outputs found

    A peptide-level multiple imputation strategy accounting for the different natures of missing values in proteomics data

    No full text
    Motivation: Quantitative mass spectrometry-based proteomics data are characterized by high rates of missing values, which may be of two kinds: missing completely-at-random (MCAR) and missing not-at-random (MNAR). Despite numerous imputation methods available in the literature, none account for this duality, for it would require to diagnose the missingness mechanism behind each missing value. Results: A multiple imputation strategy is proposed by combining MCAR-devoted and MNAR-devoted imputation algorithms. First, we propose an estimator for the proportion of MCAR values and show it is asymptotically unbiased under assumptions adapted to label-free proteomics data. This allows us to estimate the number of MCAR values in each sample and to take into account the nature of missing values through an original multiple imputation method. We evaluate this approach on simulated data and shows it outperforms traditionally used imputation algorithms. Availability The proposed methods are implemented in the R package imp4p (available on the CRAN Giai Gianetto (2020)), which is itself accessible through Prostar software. Contact [email protected] ; [email protected]

    Machine learning to improve the interpretation of intercalating dye-based quantitative PCR results

    No full text
    International audienceThis study aimed to evaluate the contribution of Machine Learning (ML) approach in the interpretation of intercalating dye-based quantitative PCR (IDqPCR) signals applied to the diagnosis of mucormycosis. The ML-based classification approach was applied to 734 results of IDqPCR categorized as positive (n = 74) or negative (n = 660) for mucormycosis after combining "visual reading" of the amplification and denaturation curves with clinical, radiological and microbiological criteria. Fourteen features were calculated to characterize the curves and injected in several pipelines including four ML-algorithms. An initial subset (n = 345) was used for the conception of classifiers. The classifier predictions were combined with majority voting to estimate performances of 48 meta-classifiers on an external dataset (n = 389). The visual reading returned 57 (7.7%), 568 (77.4%) and 109 (14.8%) positive, negative and doubtful results respectively. The Kappa coefficients of all the meta-classifiers were greater than 0.83 for the classification of IDqPCR results on the external dataset. Among these metaclassifiers, 6 exhibited Kappa coefficients at 1. The proposed ML-based approach allows a rigorous interpretation of IDqPCR curves, making the diagnosis of mucormycosis available for non-specialists in molecular diagnosis. A free online application was developed to classify IDqPCR from the raw data of the thermal cycler output (http:// gepamy-sat. asso. st/)

    Protein-Level Statistical Analysis of Quantitative Label-Free Proteomics Data with ProStaR

    No full text
    International audienceProStaR is a software tool dedicated to differential analysis in label-free quantitative proteomics. Practically, once biological samples have been analyzed by bottom-up mass spectrometry-based proteomics, the raw mass spectrometer outputs are processed by bioinformatics tools, so as to identify peptides and quantify them, by means of precursor ion chromatogram integration. Then, it is classical to use these peptide-level pieces of information to derive the identity and quantity of the sample proteins before proceeding with refined statistical processing at protein-level, so as to bring out proteins which abundance is significantly different between different groups of samples. To achieve this statistical step, it is possible to rely on ProStaR, which allows the user to (1) load correctly formatted data, (2) clean them by means of various filters, (3) normalize the sample batches, (4) impute the missing values, (5) perform null hypothesis significance testing, (6) check the well-calibration of the resulting p-values, (7) select a subset of differentially abundant proteins according to some false discovery rate, and (8) contextualize these selected proteins into the Gene Ontology. This chapter provide a detailed protocol on how to perform these eight processing steps with ProStaR
    corecore