69 research outputs found

    Unified feature association networks through integration of transcriptomic and proteomic data

    Get PDF
    High-throughput multi-omics studies and corresponding network analyses of multi-omic data have rapidly expanded their impact over the last 10 years. As biological features of different types (e.g. transcripts, proteins, metabolites) interact within cellular systems, the greatest amount of knowledge can be gained from networks that incorporate multiple types of -omic data. However, biological and technical sources of variation diminish the ability to detect cross-type associations, yielding networks dominated by communities comprised of nodes of the same type. We describe here network building methods that can maximize edges between nodes of different data types leading to integrated networks, networks that have a large number of edges that link nodes of different–omic types (transcripts, proteins, lipids etc). We systematically rank several network inference methods and demonstrate that, in many cases, using a random forest method, GENIE3, produces the most integrated networks. This increase in integration does not come at the cost of accuracy as GENIE3 produces networks of approximately the same quality as the other network inference methods tested here. Using GENIE3, we also infer networks representing antibody-mediated Dengue virus cell invasion and receptor-mediated Dengue virus invasion. A number of functional pathways showed centrality differences between the two networks including genes responding to both GM-CSF and IL-4, which had a higher centrality value in an antibody-mediated vs. receptor-mediated Dengue network. Because a biological system involves the interplay of many different types of molecules, incorporating multiple data types into networks will improve their use as models of biological systems. The methods explored here are some of the first to specifically highlight and address the challenges associated with how such multi-omic networks can be assembled and how the greatest number of interactions can be inferred from different data types. The resulting networks can lead to the discovery of new host response patterns and interactions during viral infection, generate new hypotheses of pathogenic mechanisms and confirm mechanisms of disease

    Proteomic analysis of four Clostridium botulinum strains identifies proteins that link biological responses to proteomic signatures.

    No full text
    Microorganisms alter gene and protein expression in response to environmental conditions to adapt and survive. Whereas the genetic composition of a microbe represents an organism's biological potential, the proteins expressed provide a functional readout of the organism's response to the environment. Understanding protein expression patterns in response to specific environmental conditions furthers fundamental knowledge about a microbe, which can be especially useful for understudied organisms such as Clostridium botulinum examined herein. In addition, protein expression patterns that reproducibly occur in certain growth conditions hold potential in fields such as microbial forensics, in which determination of conditions in which an unknown possible biothreat sample had been grown may be important. To investigate the identity and reproducibility of protein profile patterns for varied strains, we defined the proteomic profiles of four Group I strains of Clostridium botulinum, a Category A biothreat agent and the organism responsible for the production of the botulinum neurotoxin (BoNT), in two different culture media grown for five days. The four C. botulinum strains produced one of three neurotoxins (BoNT/A, /B, or /F), and their protein profiles were compared to that of a fifth non-toxigenic strain of C. sporogenes. These strains each had DNA sequences available to assist in accurate protein identification. Differing culture growth phase, bacterial strain, and growth medium resulted in reproducible protein profiles, which were used to calculate relative protein abundance ratios as an internally normalized metric of microbial growth in varying conditions. The resulting protein profiles provide functional information about how four Group I C. botulinum strains and a C. sporogenes strain respond to the culture environment during growth and explores the feasibility of using these proteins to characterize unknown samples

    LipidOz enables automated elucidation of lipid carbon–carbon double bond positions from ozone-induced dissociation mass spectrometry data

    No full text
    Abstract Lipids play essential roles in many biological processes and disease pathology, but unambiguous identification of lipids is complicated by the presence of multiple isomeric species differing by fatty acyl chain length, stereospecifically numbered (sn) position, and position/stereochemistry of double bonds. Conventional liquid chromatography-mass spectrometry (LC-MS/MS) analyses enable the determination of fatty acyl chain lengths (and in some cases sn position) and number of double bonds, but not carbon-carbon double bond positions. Ozone-induced dissociation (OzID) is a gas-phase oxidation reaction that produces characteristic fragments from lipids containing double bonds. OzID can be incorporated into ion mobility spectrometry (IMS)-MS instruments for the structural characterization of lipids, including additional isomer separation and confident assignment of double bond positions. The complexity and repetitive nature of OzID data analysis and lack of software tool support have limited the application of OzID for routine lipidomics studies. Here, we present an open-source Python tool, LipidOz, for the automated determination of lipid double bond positions from OzID-IMS-MS data, which employs a combination of traditional automation and deep learning approaches. Our results demonstrate the ability of LipidOz to robustly assign double bond positions for lipid standard mixtures and complex lipid extracts, enabling practical application of OzID for future lipidomics

    Unified feature association networks through integration of transcriptomic and proteomic data.

    No full text
    High-throughput multi-omics studies and corresponding network analyses of multi-omic data have rapidly expanded their impact over the last 10 years. As biological features of different types (e.g. transcripts, proteins, metabolites) interact within cellular systems, the greatest amount of knowledge can be gained from networks that incorporate multiple types of -omic data. However, biological and technical sources of variation diminish the ability to detect cross-type associations, yielding networks dominated by communities comprised of nodes of the same type. We describe here network building methods that can maximize edges between nodes of different data types leading to integrated networks, networks that have a large number of edges that link nodes of different-omic types (transcripts, proteins, lipids etc). We systematically rank several network inference methods and demonstrate that, in many cases, using a random forest method, GENIE3, produces the most integrated networks. This increase in integration does not come at the cost of accuracy as GENIE3 produces networks of approximately the same quality as the other network inference methods tested here. Using GENIE3, we also infer networks representing antibody-mediated Dengue virus cell invasion and receptor-mediated Dengue virus invasion. A number of functional pathways showed centrality differences between the two networks including genes responding to both GM-CSF and IL-4, which had a higher centrality value in an antibody-mediated vs. receptor-mediated Dengue network. Because a biological system involves the interplay of many different types of molecules, incorporating multiple data types into networks will improve their use as models of biological systems. The methods explored here are some of the first to specifically highlight and address the challenges associated with how such multi-omic networks can be assembled and how the greatest number of interactions can be inferred from different data types. The resulting networks can lead to the discovery of new host response patterns and interactions during viral infection, generate new hypotheses of pathogenic mechanisms and confirm mechanisms of disease

    Investigation of Yersinia pestis Laboratory Adaptation through a Combined Genomics and Proteomics Approach

    No full text
    The bacterial pathogen Yersinia pestis, the cause of plague in humans and animals, normally has a sylvatic lifestyle, cycling between fleas and mammals. In contrast, laboratory-grown Y. pestis experiences a more constant environment and conditions that it would not normally encounter. The transition from the natural environment to the laboratory results in a vastly different set of selective pressures, and represents what could be considered domestication. Understanding the kinds of adaptations Y. pestis undergoes as it becomes domesticated will contribute to understanding the basic biology of this important pathogen. In this study, we performed a parallel serial passage experiment (PSPE) to explore the mechanisms by which Y. pestis adapts to laboratory conditions, hypothesizing that cells would undergo significant changes in virulence and nutrient acquisition systems. Two wild strains were serially passaged in 12 independent populations each for ~750 generations, after which each population was analyzed using whole-genome sequencing, LC-MS/MS proteomic analysis, and GC/MS metabolomics. We observed considerable parallel evolution in the endpoint populations, detecting multiple independent mutations in ail, pepA, and zwf, suggesting that specific selective pressures are shaping evolutionary responses. Complementary LC-MS/MS proteomic data provide physiological context to the observed mutations, and reveal regulatory changes not necessarily associated with specific mutations, including changes in amino acid metabolism and cell envelope biogenesis. Proteomic data support hypotheses generated by genomic data in addition to suggesting future mechanistic studies, indicating that future whole-genome sequencing studies be designed to leverage proteomics as a critical complement

    Protein abundances can distinguish between naturally-occurring and laboratory strains of <i>Yersinia pestis</i>, the causative agent of plague

    No full text
    <div><p>The rapid pace of bacterial evolution enables organisms to adapt to the laboratory environment with repeated passage and thus diverge from naturally-occurring environmental (“wild”) strains. Distinguishing wild and laboratory strains is clearly important for biodefense and bioforensics; however, DNA sequence data alone has thus far not provided a clear signature, perhaps due to lack of understanding of how diverse genome changes lead to convergent phenotypes, difficulty in detecting certain types of mutations, or perhaps because some adaptive modifications are epigenetic. Monitoring protein abundance, a molecular measure of phenotype, can overcome some of these difficulties. We have assembled a collection of <i>Yersinia pestis</i> proteomics datasets from our own published and unpublished work, and from a proteomics data archive, and demonstrated that protein abundance data can clearly distinguish laboratory-adapted from wild. We developed a lasso logistic regression classifier that uses binary (presence/absence) or quantitative protein abundance measures to predict whether a sample is laboratory-adapted or wild that proved to be ~98% accurate, as judged by replicated 10-fold cross-validation. Protein features selected by the classifier accord well with our previous study of laboratory adaptation in <i>Y</i>. <i>pestis</i>. The input data was derived from a variety of unrelated experiments and contained significant confounding variables. We show that the classifier is robust with respect to these variables. The methodology is able to discover signatures for laboratory facility and culture medium that are largely independent of the signature of laboratory adaptation. Going beyond our previous laboratory evolution study, this work suggests that proteomic differences between laboratory-adapted and wild <i>Y</i>. <i>pestis</i> are general, potentially pointing to a process that could apply to other species as well. Additionally, we show that proteomics datasets (even archived data collected for different purposes) contain the information necessary to distinguish wild and laboratory samples. This work has clear applications in biomarker detection as well as biodefense.</p></div

    More protein features than those reported in Table 2 can accurately classify laboratory vs. wild samples.

    No full text
    <p>The Lasso logistic regression classifier (LRC) was constructed in iterations, with the input data for each iteration consisting of all protein features not selected by the LRC in any previous iteration. The plots show the classifier accuracy on the vertical axis plotted against the number of iterations on the horizontal axis. The number of features selected in each iteration is the plotted symbol. <b>A</b>, LRCs using quantitative protein abundance data; <b>B</b>, LRCs using presence/absence data. Note that the accuracy value in the limit of large numbers of iterations is equal to the proportion of laboratory samples in the data, and represents the limit where the features used contain no information useful for classification.</p
    • …
    corecore