6,958 research outputs found
MorphDB : prioritizing genes for specialized metabolism pathways and gene ontology categories in plants
Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest
Fundamental properties of the mammalian innate immune system revealed by multispecies comparison of type I interferon responses
The host innate immune response mediated by type I interferon (IFN) and the resulting up-regulation of hundreds of interferon-stimulated genes (ISGs) provide an immediate barrier to virus infection. Studies of the type I ‘interferome’ have mainly been carried out at a single species level, often lacking the power necessary to understand key evolutionary features of this pathway. Here, using a single experimental platform, we determined the properties of the interferomes of multiple vertebrate species and developed a webserver to mine the dataset. This approach revealed a conserved ‘core’ of 62 ISGs, including genes not previously associated with IFN, underscoring the ancestral functions associated with this antiviral host response. We show that gene expansion contributes to the evolution of the IFN system and that interferomes are shaped by lineage-specific pressures. Consequently, each mammal possesses a unique repertoire of ISGs, including genes common to all mammals and others unique to their specific species or phylogenetic lineages. An analysis of genes commonly down-regulated by IFN suggests that epigenetic regulation of transcription is a fundamental aspect of the IFN response. Our study provides a resource for the scientific community highlighting key paradigms of the type I IFN response
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
A Systemic Receptor Network Triggered by Human cytomegalovirus Entry
Virus entry is a multistep process that triggers a variety of cellular
pathways interconnecting into a complex network, yet the molecular complexity
of this network remains largely unsolved. Here, by employing systems biology
approach, we reveal a systemic virus-entry network initiated by human
cytomegalovirus (HCMV), a widespread opportunistic pathogen. This network
contains all known interactions and functional modules (i.e. groups of
proteins) coordinately responding to HCMV entry. The number of both genes and
functional modules activated in this network dramatically declines shortly,
within 25 min post-infection. While modules annotated as receptor system, ion
transport, and immune response are continuously activated during the entire
process of HCMV entry, those for cell adhesion and skeletal movement are
specifically activated during viral early attachment, and those for immune
response during virus entry. HCMV entry requires a complex receptor network
involving different cellular components, comprising not only cell surface
receptors, but also pathway components in signal transduction, skeletal
development, immune response, endocytosis, ion transport, macromolecule
metabolism and chromatin remodeling. Interestingly, genes that function in
chromatin remodeling are the most abundant in this receptor system, suggesting
that global modulation of transcriptions is one of the most important events in
HCMV entry. Results of in silico knock out further reveal that this entire
receptor network is primarily controlled by multiple elements, such as EGFR
(Epidermal Growth Factor) and SLC10A1 (sodium/bile acid cotransporter family,
member 1). Thus, our results demonstrate that a complex systemic network, in
which components coordinating efficiently in time and space contributes to
virus entry.Comment: 26 page
Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis
In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the
biomolecular data analysis. With the combination of spectral graph method, I
reveal the essential difference between the global scale models and local scale
ones in structure clustering, i.e., different optimization on Euclidean (or
spatial) distances and sequential (or genomic) distances. More specifically,
clusters from global scale models optimize Euclidean distance relations. Local
scale models, on the other hand, result in clusters that optimize the genomic
distance relations. For a biomolecular data, Euclidean distances and sequential
distances are two independent variables, which can never be optimized
simultaneously in data clustering. However, sequence scale in my SeqMM can work
as a tuning parameter that balances these two variables and deliver different
clusterings based on my purposes. Further, my SeqMM is used to explore the
hierarchical structures of chromosomes. I find that in global scale, the
Fiedler vector from my SeqMM bears a great similarity with the principal vector
from principal component analysis, and can be used to study genomic
compartments. In TAD analysis, I find that TADs evaluated from different scales
are not consistent and vary a lot. Particularly when the sequence scale is
small, the calculated TAD boundaries are dramatically different. Even for
regions with high contact frequencies, TAD regions show no obvious consistence.
However, when the scale value increases further, although TADs are still quite
different, TAD boundaries in these high contact frequency regions become more
and more consistent. Finally, I find that for a fixed local scale, my method
can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE
- …