89 research outputs found

    Variance component score test for time-course gene set analysis of longitudinal RNA-seq data

    Get PDF
    As gene expression measurement technology is shifting from microarrays to sequencing, the statistical tools available for their analysis must be adapted since RNA-seq data are measured as counts. Recently, it has been proposed to tackle the count nature of these data by modeling log-count reads per million as continuous variables, using nonparametric regression to account for their inherent heteroscedasticity. Adopting such a framework, we propose tcgsaseq, a principled, model-free and efficient top-down method for detecting longitudinal changes in RNA-seq gene sets. Considering gene sets defined a priori, tcgsaseq identifies those whose expression vary over time, based on an original variance component score test accounting for both covariates and heteroscedasticity without assuming any specific parametric distribution for the transformed counts. We demonstrate that despite the presence of a nonparametric component, our test statistic has a simple form and limiting distribution, and both may be computed quickly. A permutation version of the test is additionally proposed for very small sample sizes. Applied to both simulated data and two real datasets, the proposed method is shown to exhibit very good statistical properties, with an increase in stability and power when compared to state of the art methods ROAST, edgeR and DESeq2, which can fail to control the type I error under certain realistic settings. We have made the method available for the community in the R package tcgsaseq.Comment: 23 pages, 6 figures, typo corrections & acceptance acknowledgemen

    Sequential Dirichlet Process Mixtures of Multivariate Skew t-distributions for Model-based Clustering of Flow Cytometry Data

    Get PDF
    39 pages, 11 figuresInternational audienceFlow cytometry is a high-throughput technology used to quantify multiple surface and intracellular markers at the level of a single cell. This enables to identify cell sub-types, and to determine their relative proportions. Improvements of this technology allow to describe millions of individual cells from a blood sample using multiple markers. This results in high-dimensional datasets, whose manual analysis is highly time-consuming and poorly reproducible. While several methods have been developed to perform automatic recognition of cell populations, most of them treat and analyze each sample independently. However, in practice, individual samples are rarely independent (e.g. longitudinal studies). Here, we propose to use a Bayesian nonparametric approach with Dirichlet process mixture (DPM) of multivariate skew tt-distributions to perform model based clustering of flow-cytometry data. DPM models directly estimate the number of cell populations from the data, avoiding model selection issues, and skew tt-distributions provides robustness to outliers and non-elliptical shape of cell populations. To accommodate repeated measurements, we propose a sequential strategy relying on a parametric approximation of the posterior. We illustrate the good performance of our method on simulated data, on an experimental benchmark dataset, and on new longitudinal data from the DALIA-1 trial which evaluates a therapeutic vaccine against HIV. On the benchmark dataset, the sequential strategy outperforms all other methods evaluated, and similarly, leads to improved performance on the DALIA-1 data. We have made the method available for the community in the R package NPflow

    Population modeling of early COVID-19 epidemic dynamics in French regions and estimation of the lockdown impact on infection rate

    Get PDF
    We propose a population approach to model the beginning of the French COVID-19 epidemic at the regional level. We rely on an extended Susceptible-Exposed-Infectious-Recovered (SEIR) mechanistic model, a simplified representation of the average epidemic process. Combining several French public datasets on the early dynamics of the epidemic, we estimate region-specific key parameters conditionally on this mechanistic model through Stochastic Approximation Expectation Maximization (SAEM) optimization using Monolix software. We thus estimate basic reproductive numbers by region before isolation (between 2.4 and 3.1), the percentage of infected people over time (between 2.0 and 5.9% as of May 11 th , 2020) and the impact of nationwide lockdown on the infection rate (decreasing the transmission rate by 72% toward a R e ranging from 0.7 to 0.9). We conclude that a lifting of the lockdown should be accompanied by further interventions to avoid an epidemic rebound

    Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes

    Get PDF
    International audienceWe develop an algorithm for probabilistic linkage of de-identified research datasets at the patient level, when only diagnosis codes with discrepancies and no personal health identifiers such as name or date of birth are available. It relies on Bayesian modelling of binarized diagnosis codes, and provides a posterior probability of matching for each patient pair, while considering all the data at once. Both in our simulation study (using an administrative claims dataset for data generation) and in two real use-cases linking patient electronic health records from a large tertiary care network, our method exhibits good performance and compares favourably to the standard baseline Fellegi-Sunter algorithm. We propose a scalable, fast and efficient open-source implementation in the ludic R package available on CRAN, which also includes the anonymized diagnosis code data from our real use-case. This work suggests it is possible to link de-identified research databases stripped of any personal health identifiers using only diagnosis codes, provided sufficient information is shared between the data sources

    Front Immunol

    Get PDF
    The goal of HIV therapeutic vaccination is to induce HIV-specific immune response able to control HIV replication. We previously reported that vaccination with ex vivo generated Dendritic Cells (DC) loaded with HIV-lipopeptides in HIV-infected patients (n = 19) on antiretroviral therapy (ART) was well-tolerated and immunogenic. Vaccine-elicited HIV-specific T cell responses were associated with improved control of viral replication following antiretroviral interruption (ATI from w24 to w48). We show an inverse relationship between HIV-specific responses (production of IL-2, IL-13, IL-21, IFN-g, CD4 polyfunctionality, i.e., production of at least two cytokines) and the peak of viral load during ATI. Here we have performed an integrative systems vaccinology analysis including: (i) post vaccination (w16) immune responses assessed by cytometry, cytokine secretion, and Interferon-Îł ELISPOT assays; (ii) whole blood and cellular gene expression measured during vaccination; and (iii) viral parameters following ATI, with the objective to disentangle the relationships between these markers and to identify vaccine signatures. During vaccination, 69 gene expression modules out of 260 varied significantly including (by order of significance) modules related to inflammation (Chaussabel Modules M3.2, M4.13, M4.6, M5.7, M7.1, M4.2), plasma cells (M4.11) and T cells (M4.1, 4.15). Cellular immune responses were positively correlated to genes belonging to T cell functional modules (M4.1, M4.15) at w16 and negatively correlated to genes belonging to inflammation modules (M7.1, M5.7, M3.2, M4.13, M4.2). More specifically, we show that prolonged increased abundance of inflammatory gene pathways related to toll-like receptor signaling (especially TLR4) are associated with both lower vaccine immune responses and control of viral replication post ATI. Further comparison of DC vaccine gene signatures with previously reported non-HIV vaccine signatures, such as flu and pneumococcal vaccines, revealed common pathways across vaccines. Overall, these results show that too long duration and too high intensity of vaccine inflammatory responses hamper the magnitude of effector responses. [ABSTRACT FROM AUTHOR] Copyright of Frontiers in Immunology is the property of Frontiers Media S.A. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract

    Diet-Related Metabolites Associated with Cognitive Decline Revealed by Untargeted Metabolomics in a Prospective Cohort

    Get PDF
    Scope: Untargeted metabolomics may reveal preventive targets in cognitive aging, including within the food metabolome. Methods and results: A case-control study nested in the prospective Three-City study includes participants aged &65 years and initially free of dementia. A total of 209 cases of cognitive decline and 209 controls (matched for age, gen- der, education) with slower cognitive decline over up to 12 years are contrasted. Using untargeted metabolomics and bootstrap-enhanced penalized regression, a baseline serum signature of 22 metabolites associated with subsequent cognitive decline is identified. The signature includes three coffee metabolites, a biomarker of citrus intake, a cocoa metabolite, two metabolites putatively derived from fish and wine, three medium-chain acylcarnitines, glycodeoxycholic acid, lysoPC(18:3), trimethyllysine, glucose, cortisol, creatinine, and arginine. Adding the 22 metabolites to a reference predictive model for cognitive decline (conditioned on age, gender, education and including ApoE-ε4, diabetes, BMI, and number of medications) substantially increases the predictive performance: cross-validated Area Under the Receiver Operating Curve = 75% [95% CI 70-80%] compared to 62% [95% CI 56-67%]. Conclusions: The untargeted metabolomics study supports a protective role of specific foods (e.g., coffee, cocoa, fish) and various alterations in the endogenous metabolism responsive to diet in cognitive aging

    Early signature in the blood lipidome associated with subsequent cognitive decline in the elderly: A case-control analysis nested within the Three-City cohort study

    Get PDF
    Background Brain lipid metabolism appears critical for cognitive aging, but whether alterations in the lipidome relate to cognitive decline remains unclear at the system level. Methods We studied participants from the Three-City study, a multicentric cohort of older persons, free of dementia at time of blood sampling, and who provided repeated measures of cognition over 12 subsequent years. We measured 189 serum lipids from 13 lipid classes using shotgun lipidomics in a case-control sample on cognitive decline (matched on age, sex and level of education) nested within the Bordeaux study center (discovery, n = 418). Associations with cognitive decline were investigated using bootstrapped penalized regression, and tested for validation in the Dijon study center (validation, n = 314). Findings Among 17 lipids identified in the discovery stage, lower levels of the triglyceride TAG50:5, and of four membrane lipids (sphingomyelin SM40:2,2, phosphatidylethanolamine PE38:5(18:1/20:4), ether-phosphatidylethanolamine PEO34:3(16:1/18:2), and ether-phosphatidylcholine PCO34:1(16:1/18:0)), and higher levels of PCO32:0(16:0/16:0), were associated with greater odds of cognitive decline, and replicated in our validation sample. Interpretation These findings indicate that in the blood lipidome of non-demented older persons, a specific profile of lipids involved in membrane fluidity, myelination, and lipid rafts, is associated with subsequent cognitive decline. Funding The complete list of funders is available at the end of the manuscript, in the Acknowledgement section

    EBioMedicine

    Get PDF
    BACKGROUND: Brain lipid metabolism appears critical for cognitive aging, but whether alterations in the lipidome relate to cognitive decline remains unclear at the system level. METHODS: We studied participants from the Three-City study, a multicentric cohort of older persons, free of dementia at time of blood sampling, and who provided repeated measures of cognition over 12 subsequent years. We measured 189 serum lipids from 13 lipid classes using shotgun lipidomics in a case-control sample on cognitive decline (matched on age, sex and level of education) nested within the Bordeaux study center (discovery, n = 418). Associations with cognitive decline were investigated using bootstrapped penalized regression, and tested for validation in the Dijon study center (validation, n = 314). FINDINGS: Among 17 lipids identified in the discovery stage, lower levels of the triglyceride TAG50:5, and of four membrane lipids (sphingomyelin SM40:2,2, phosphatidylethanolamine PE38:5(18:1/20:4), ether-phosphatidylethanolamine PEO34:3(16:1/18:2), and ether-phosphatidylcholine PCO34:1(16:1/18:0)), and higher levels of PCO32:0(16:0/16:0), were associated with greater odds of cognitive decline, and replicated in our validation sample. INTERPRETATION: These findings indicate that in the blood lipidome of non-demented older persons, a specific profile of lipids involved in membrane fluidity, myelination, and lipid rafts, is associated with subsequent cognitive decline. FUNDING: The complete list of funders is available at the end of the manuscript, in the Acknowledgement section
    • …
    corecore