6,958 research outputs found

    MorphDB : prioritizing genes for specialized metabolism pathways and gene ontology categories in plants

    Get PDF
    Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest

    Fundamental properties of the mammalian innate immune system revealed by multispecies comparison of type I interferon responses

    Get PDF
    The host innate immune response mediated by type I interferon (IFN) and the resulting up-regulation of hundreds of interferon-stimulated genes (ISGs) provide an immediate barrier to virus infection. Studies of the type I ‘interferome’ have mainly been carried out at a single species level, often lacking the power necessary to understand key evolutionary features of this pathway. Here, using a single experimental platform, we determined the properties of the interferomes of multiple vertebrate species and developed a webserver to mine the dataset. This approach revealed a conserved ‘core’ of 62 ISGs, including genes not previously associated with IFN, underscoring the ancestral functions associated with this antiviral host response. We show that gene expansion contributes to the evolution of the IFN system and that interferomes are shaped by lineage-specific pressures. Consequently, each mammal possesses a unique repertoire of ISGs, including genes common to all mammals and others unique to their specific species or phylogenetic lineages. An analysis of genes commonly down-regulated by IFN suggests that epigenetic regulation of transcription is a fundamental aspect of the IFN response. Our study provides a resource for the scientific community highlighting key paradigms of the type I IFN response

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    A Systemic Receptor Network Triggered by Human cytomegalovirus Entry

    Get PDF
    Virus entry is a multistep process that triggers a variety of cellular pathways interconnecting into a complex network, yet the molecular complexity of this network remains largely unsolved. Here, by employing systems biology approach, we reveal a systemic virus-entry network initiated by human cytomegalovirus (HCMV), a widespread opportunistic pathogen. This network contains all known interactions and functional modules (i.e. groups of proteins) coordinately responding to HCMV entry. The number of both genes and functional modules activated in this network dramatically declines shortly, within 25 min post-infection. While modules annotated as receptor system, ion transport, and immune response are continuously activated during the entire process of HCMV entry, those for cell adhesion and skeletal movement are specifically activated during viral early attachment, and those for immune response during virus entry. HCMV entry requires a complex receptor network involving different cellular components, comprising not only cell surface receptors, but also pathway components in signal transduction, skeletal development, immune response, endocytosis, ion transport, macromolecule metabolism and chromatin remodeling. Interestingly, genes that function in chromatin remodeling are the most abundant in this receptor system, suggesting that global modulation of transcriptions is one of the most important events in HCMV entry. Results of in silico knock out further reveal that this entire receptor network is primarily controlled by multiple elements, such as EGFR (Epidermal Growth Factor) and SLC10A1 (sodium/bile acid cotransporter family, member 1). Thus, our results demonstrate that a complex systemic network, in which components coordinating efficiently in time and space contributes to virus entry.Comment: 26 page

    Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis

    Full text link
    In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the biomolecular data analysis. With the combination of spectral graph method, I reveal the essential difference between the global scale models and local scale ones in structure clustering, i.e., different optimization on Euclidean (or spatial) distances and sequential (or genomic) distances. More specifically, clusters from global scale models optimize Euclidean distance relations. Local scale models, on the other hand, result in clusters that optimize the genomic distance relations. For a biomolecular data, Euclidean distances and sequential distances are two independent variables, which can never be optimized simultaneously in data clustering. However, sequence scale in my SeqMM can work as a tuning parameter that balances these two variables and deliver different clusterings based on my purposes. Further, my SeqMM is used to explore the hierarchical structures of chromosomes. I find that in global scale, the Fiedler vector from my SeqMM bears a great similarity with the principal vector from principal component analysis, and can be used to study genomic compartments. In TAD analysis, I find that TADs evaluated from different scales are not consistent and vary a lot. Particularly when the sequence scale is small, the calculated TAD boundaries are dramatically different. Even for regions with high contact frequencies, TAD regions show no obvious consistence. However, when the scale value increases further, although TADs are still quite different, TAD boundaries in these high contact frequency regions become more and more consistent. Finally, I find that for a fixed local scale, my method can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE
    corecore