712 research outputs found
Recommended from our members
Synergistic drug combinations from electronic health records and gene expression.
ObjectiveUsing electronic health records (EHRs) and biomolecular data, we sought to discover drug pairs with synergistic repurposing potential. EHRs provide real-world treatment and outcome patterns, while complementary biomolecular data, including disease-specific gene expression and drug-protein interactions, provide mechanistic understanding.MethodWe applied Group Lasso INTERaction NETwork (glinternet), an overlap group lasso penalty on a logistic regression model, with pairwise interactions to identify variables and interacting drug pairs associated with reduced 5-year mortality using EHRs of 9945 breast cancer patients. We identified differentially expressed genes from 14 case-control human breast cancer gene expression datasets and integrated them with drug-protein networks. Drugs in the network were scored according to their association with breast cancer individually or in pairs. Lastly, we determined whether synergistic drug pairs found in the EHRs were enriched among synergistic drug pairs from gene-expression data using a method similar to gene set enrichment analysis.ResultsFrom EHRs, we discovered 3 drug-class pairs associated with lower mortality: anti-inflammatories and hormone antagonists, anti-inflammatories and lipid modifiers, and lipid modifiers and obstructive airway drugs. The first 2 pairs were also enriched among pairs discovered using gene expression data and are supported by molecular interactions in drug-protein networks and preclinical and epidemiologic evidence.ConclusionsThis is a proof-of-concept study demonstrating that a combination of complementary data sources, such as EHRs and gene expression, can corroborate discoveries and provide mechanistic insight into drug synergism for repurposing
Model-based joint visualization of multiple compositional omics datasets
The integration of multiple omics datasets measured on the same samples is a challenging task: data come from heterogeneous sources and vary in signal quality. In addition, some omics data are inherently compositional, e.g. sequence count data. Most integrative methods are limited in their ability to handle covariates, missing values, compositional structure and heteroscedasticity. In this article we introduce a flexible model-based approach to data integration to address these current limitations: COMBI. We combine concepts, such as compositional biplots and log-ratio link functions with latent variable models, and propose an attractive visualization through multiplots to improve interpretation. Using real data examples and simulations, we illustrate and compare our method with other data integration techniques. Our algorithm is available in the R-package combi
A flexible and versatile framework for statistical design and analysis of quantitative mass spectrometry-based proteomic experiments
Quantitative mass spectrometry (MS)-based proteomics is an indispensable technology for biological and clinical research. As the proteomics field grows, MS-based proteomic workflows are becoming more complex and diverse. The accuracy and the throughput of the MS measurements and of the signal processing tools dramatically increased. However, many existing statistical tools and workflows have not followed the technological development. Therefore, there is a need for flexible statistical tools, which reflect diverse and complex workflows, are computationally efficient for large datasets, and maximize the reproducibility of the results.
We propose a family of linear mixed effects models, and a split-plot view of the experimental design, that represent measurements from quantitative mass spectrometry-based proteomics. The whole plot part of the design reflects the structure of the biological variation of the experiment, such as case-control design, paired design, or time-course design. The subplot part of the design reflects the structure of the technological variation, such as fragmentation patterns, labeling strategy, and presence of multiple peptides per protein. We propose an estimation procedure that separately estimates the parameters of the subplot and the whole plot parts of the design, to maximize the flexibility of the model, increase the speed of the analysis, and facilitate the interpretation.
The proposed modeling framework was validated using 9 controlled mixtures and 10 experimental datasets from targeted Selected Reaction Monitoring (SRM), Data-Dependent Acquisition (DDA or shotgun), and Data-Independent Acquisition (DIA or SWATH-MS), where signals were extracted with multiple signal processing tools. We implemented the proposed method in the software package MSstats, which checks the correctness of the user input, recognizes arbitrary complex experimental design, visualizes the data and performs statistical modeling and inference. It is interoperable with other existing computational tools such as Skyline
Development of data processing methods for high resolution mass spectrometry-based metabolomics with an application to human liver transplantation
Direct Infusion (DI) Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometry (MS) is becoming a popular measurement platform in metabolomics. This thesis aims to advance the data processing and analysis pipeline of the DI FT-ICR based metabolomics, and broaden its applicability to a clinical research. To meet the first objective, the issue of missing data that occur in a final data matrix containing metabolite relative abundances measured for each sample analysed, is addressed. The nature of these data and their effect on the subsequent data analyses are investigated. Eight common and/or easily accessible missing data estimation algorithms are examined and a three stage approach is proposed to aid the identification of the optimal one. Finally, a novel survival analysis approach is introduced and assessed as an alternative way of missing data treatment prior univariate analysis. To address the second objective, DI FT-ICR MS based metabolomics is assessed in terms of its applicability to research investigating metabolomic changes occurring in liver grafts throughout the human orthotopic liver transplantation (OLT). The feasibility of this approach to a clinical setting is validated and its potential to provide a wealth of novel metabolic information associated with OLT is demonstrated
- …