2,479 research outputs found

    Multiplex methods provide effective integration of multi-omic data in genome-scale models

    Get PDF
    BackgroundGenomic, transcriptomic, and metabolic variations shape the complex adaptation landscape of bacteria to varying environmental conditions. Elucidating the genotype-phenotype relation paves the way for the prediction of such effects, but methods for characterizing the relationship between multiple environmental factors are still lacking. Here, we tackle the problem of extracting network-level information from collections of environmental conditions, by integrating the multiple omic levels at which the bacterial response is measured.ResultsTo this end, we model a large compendium of growth conditions as a multiplex network consisting of transcriptomic and fluxomic layers, and we propose a multi-omic network approach to infer similarity of growth conditions by integrating layers of the multiplex network. Each node of the network represents a single condition, while edges are similarities between conditions, as measured by phenotypic and transcriptomic properties on different layers of the network. We then fuse these layers into one network, therefore capturing a global network of conditions and the associated similarities across two omic levels. We apply this multi-omic fusion to an updated genome-scale reconstruction of Escherichia coli that includes underground metabolism and new gene-protein-reaction associations.ConclusionsOur method can be readily used to evaluate and cross-compare different collections of conditions among different species. Acquiring multi-omic information on the topology of the space of experimental conditions makes it possible to infer the position and to build condition-specific models of untested or incomplete profiles for which experimental data is not available. Our weighted network fusion method for genome-scale models is freely available at https://github.com/maxconway/SNFtool.<br/

    Systems Biology and Pangenome of Salmonella O-Antigens.

    Get PDF
    O-antigens are glycopolymers in lipopolysaccharides expressed on the cell surface of Gram-negative bacteria. Variability in the O-antigen structure constitutes the basis for the establishment of the serotyping schema. We pursued a two-pronged approach to define the basis for O-antigen structural diversity. First, we developed a bottom-up systems biology approach to O-antigen metabolism by building a reconstruction of Salmonella O-antigen biosynthesis and used it to (i) update 410 existing Salmonella strain-specific metabolic models, (ii) predict a strain's serogroup and its O-antigen glycan synthesis capability (yielding 98% agreement with experimental data), and (iii) extend our workflow to more than 1,400 Gram-negative strains. Second, we used a top-down pangenome analysis to elucidate the genetic basis for intraserogroup O-antigen structural variations. We assembled a database of O-antigen gene islands from over 11,000 sequenced Salmonella strains, revealing (i) that gene duplication, pseudogene formation, gene deletion, and bacteriophage insertion elements occur ubiquitously across serogroups; (ii) novel serotypes in the group O:4 B2 variant, as well as an additional genotype variant for group O:4, and (iii) two novel O-antigen gene islands in understudied subspecies. We thus comprehensively defined the genetic basis for O-antigen diversity.IMPORTANCE Lipopolysaccharides are a major component of the outer membrane in Gram-negative bacteria. They are composed of a conserved lipid structure that is embedded in the outer leaflet of the outer membrane and a polysaccharide known as the O-antigen. O-antigens are highly variable in structure across strains of a species and are crucial to a bacterium's interactions with its environment. They constitute the first line of defense against both the immune system and bacteriophage infections and have been shown to mediate antimicrobial resistance. The significance of our research is in identifying the metabolic and genetic differences within and across O-antigen groups in Salmonella strains. Our effort constitutes a first step toward characterizing the O-antigen metabolic network across Gram-negative organisms and a comprehensive overview of genetic variations in Salmonella

    Multilayer modelling of the human transcriptome and biological mechanisms of complex diseases and traits.

    Get PDF
    Here, we performed a comprehensive intra-tissue and inter-tissue multilayer network analysis of the human transcriptome. We generated an atlas of communities in gene co-expression networks in 49 tissues (GTEx v8), evaluated their tissue specificity, and investigated their methodological implications. UMAP embeddings of gene expression from the communities (representing nearly 18% of all genes) robustly identified biologically-meaningful clusters. Notably, new gene expression data can be embedded into our algorithmically derived models to accelerate discoveries in high-dimensional molecular datasets and downstream diagnostic or prognostic applications. We demonstrate the generalisability of our approach through systematic testing in external genomic and transcriptomic datasets. Methodologically, prioritisation of the communities in a transcriptome-wide association study of the biomarker C-reactive protein (CRP) in 361,194 individuals in the UK Biobank identified genetically-determined expression changes associated with CRP and led to considerably improved performance. Furthermore, a deep learning framework applied to the communities in nearly 11,000 tumors profiled by The Cancer Genome Atlas across 33 different cancer types learned biologically-meaningful latent spaces, representing metastasis (p < 2.2 × 10-16) and stemness (p < 2.2 × 10-16). Our study provides a rich genomic resource to catalyse research into inter-tissue regulatory mechanisms, and their downstream consequences on human disease

    A new multi locus variable number of tandem repeat analysis scheme for epidemiological surveillance of Xanthomonas vasicola pv. musacearum, the plant pathogen causing bacterial wilt on banana and enset

    Get PDF
    Xanthomonas vasicola pv. musacearum (Xvm) which causes Xanthomonas wilt (XW) on banana (Musa accuminata x balbisiana) and enset (Ensete ventricosum), is closely related to the species Xanthomonas vasicola that contains the pathovars vasculorum (Xvv) and holcicola (Xvh), respectively pathogenic to sugarcane and sorghum. Xvm is considered a monomorphic bacterium whose intra-pathovar diversity remains poorly understood. With the sudden emergence of Xvm within east and central Africa coupled with the unknown origin of one of the two sublineages suggested for Xvm, attention has shifted to adapting technologies that focus on identifying the origin and distribution of the genetic diversity within this pathogen. Although microbiological and conventional molecular diagnostics have been useful in pathogen identification. Recent advances have ushered in an era of genomic epidemiology that aids in characterizing monomorphic pathogens. To unravel the origin and pathways of the recent emergence of XW in Eastern and Central Africa, there was a need for a genotyping tool adapted for molecular epidemiology. Multi-Locus Variable Number of Tandem Repeat Analysis (MLVA) is able to resolve the evolutionary patterns and invasion routes of a pathogen. In this study, we identified microsatellite loci from nine published Xvm genome sequences. Of the 36 detected microsatellite loci, 21 were selected for primer design and 19 determined to be highly typeable, specific, reproducible and polymorphic with two- to four- alleles per locus on a sub-collection. The 19 markers were multiplexed and applied to genotype 335 Xvm strains isolated from seven countries over several years. The microsatellite markers grouped the Xvm collection into three clusters; with two similar to the SNP-based sublineages 1 and 2 and a new cluster 3, revealing an unknown diversity in Ethiopia. Five of the 19 markers had alleles present in both Xvm and Xanthomonas vasicola pathovars holcicola and vasculorum, supporting the phylogenetic closeliness of these three pathovars. Thank to the public availability of the haplotypes on the MLVABank database, this highly reliable and polymorphic genotyping tool can be further used in a transnational surveillance network to monitor the spread and evolution of XW throughout Africa.. It will inform and guide management of Xvm both in banana-based and enset-based cropping systems. Due to the suitability of MLVA-19 markers for population genetic analyses, this genotyping tool will also be used in future microevolution studies

    Network-based methods for biological data integration in precision medicine

    Full text link
    [eng] The vast and continuously increasing volume of available biomedical data produced during the last decades opens new opportunities for large-scale modeling of disease biology, facilitating a more comprehensive and integrative understanding of its processes. Nevertheless, this type of modelling requires highly efficient computational systems capable of dealing with such levels of data volumes. Computational approximations commonly used in machine learning and data analysis, namely dimensionality reduction and network-based approaches, have been developed with the goal of effectively integrating biomedical data. Among these methods, network-based machine learning stands out due to its major advantage in terms of biomedical interpretability. These methodologies provide a highly intuitive framework for the integration and modelling of biological processes. This PhD thesis aims to explore the potential of integration of complementary available biomedical knowledge with patient-specific data to provide novel computational approaches to solve biomedical scenarios characterized by data scarcity. The primary focus is on studying how high-order graph analysis (i.e., community detection in multiplex and multilayer networks) may help elucidate the interplay of different types of data in contexts where statistical power is heavily impacted by small sample sizes, such as rare diseases and precision oncology. The central focus of this thesis is to illustrate how network biology, among the several data integration approaches with the potential to achieve this task, can play a pivotal role in addressing this challenge provided its advantages in molecular interpretability. Through its insights and methodologies, it introduces how network biology, and in particular, models based on multilayer networks, facilitates bringing the vision of precision medicine to these complex scenarios, providing a natural approach for the discovery of new biomedical relationships that overcomes the difficulties for the study of cohorts presenting limited sample sizes (data-scarce scenarios). Delving into the potential of current artificial intelligence (AI) and network biology applications to address data granularity issues in the precision medicine field, this PhD thesis presents pivotal research works, based on multilayer networks, for the analysis of two rare disease scenarios with specific data granularities, effectively overcoming the classical constraints hindering rare disease and precision oncology research. The first research article presents a personalized medicine study of the molecular determinants of severity in congenital myasthenic syndromes (CMS), a group of rare disorders of the neuromuscular junction (NMJ). The analysis of severity in rare diseases, despite its importance, is typically neglected due to data availability. In this study, modelling of biomedical knowledge via multilayer networks allowed understanding the functional implications of individual mutations in the cohort under study, as well as their relationships with the causal mutations of the disease and the different levels of severity observed. Moreover, the study presents experimental evidence of the role of a previously unsuspected gene in NMJ activity, validating the hypothetical role predicted using the newly introduced methodologies. The second research article focuses on the applicability of multilayer networks for gene priorization. Enhancing concepts for the analysis of different data granularities firstly introduced in the previous article, the presented research provides a methodology based on the persistency of network community structures in a range of modularity resolution, effectively providing a new framework for gene priorization for patient stratification. In summary, this PhD thesis presents major advances on the use of multilayer network-based approaches for the application of precision medicine to data-scarce scenarios, exploring the potential of integrating extensive available biomedical knowledge with patient-specific data

    Detection of gene communities in multi-networks reveals cancer drivers

    Get PDF
    We propose a new multi-network-based strategy to integrate different layers of genomic information and use them in a coordinate way to identify driving cancer genes. The multi-networks that we consider combine transcription factor co-targeting, microRNA co-targeting, protein-protein interaction and gene co-expression networks. The rationale behind this choice is that gene co-expression and protein-protein interactions require a tight coregulation of the partners and that such a fine tuned regulation can be obtained only combining both the transcriptional and post-transcriptional layers of regulation. To extract the relevant biological information from the multi-network we studied its partition into communities. To this end we applied a consensus clustering algorithm based on state of art community detection methods. Even if our procedure is valid in principle for any pathology in this work we concentrate on gastric, lung, pancreas and colorectal cancer and identified from the enrichment analysis of the multi-network communities a set of candidate driver cancer genes. Some of them were already known oncogenes while a few are new. The combination of the different layers of information allowed us to extract from the multi-network indications on the regulatory pattern and functional role of both the already known and the new candidate driver genes.Comment: minor modification
    • …
    corecore