5,550 research outputs found

    Transcriptional Regulation: a Genomic Overview

    Get PDF
    The availability of the Arabidopsis thaliana genome sequence allows a comprehensive analysis of transcriptional regulation in plants using novel genomic approaches and methodologies. Such a genomic view of transcription first necessitates the compilation of lists of elements. Transcription factors are the most numerous of the different types of proteins involved in transcription in eukaryotes, and the Arabidopsis genome codes for more than 1,500 of them, or approximately 6% of its total number of genes. A genome-wide comparison of transcription factors across the three eukaryotic kingdoms reveals the evolutionary generation of diversity in the components of the regulatory machinery of transcription. However, as illustrated by Arabidopsis, transcription in plants follows similar basic principles and logic to those in animals and fungi. A global view and understanding of transcription at a cellular and organismal level requires the characterization of the Arabidopsis transcriptome and promoterome, as well as of the interactome, the localizome, and the phenome of the proteins involved in transcription

    Gene fusions and gene duplications: relevance to genomic annotation and functional analysis

    Get PDF
    BACKGROUND: Escherichia coli a model organism provides information for annotation of other genomes. Our analysis of its genome has shown that proteins encoded by fused genes need special attention. Such composite (multimodular) proteins consist of two or more components (modules) encoding distinct functions. Multimodular proteins have been found to complicate both annotation and generation of sequence similar groups. Previous work overstated the number of multimodular proteins in E. coli. This work corrects the identification of modules by including sequence information from proteins in 50 sequenced microbial genomes. RESULTS: Multimodular E. coli K-12 proteins were identified from sequence similarities between their component modules and non-fused proteins in 50 genomes and from the literature. We found 109 multimodular proteins in E. coli containing either two or three modules. Most modules had standalone sequence relatives in other genomes. The separated modules together with all the single (un-fused) proteins constitute the sum of all unimodular proteins of E. coli. Pairwise sequence relationships among all E. coli unimodular proteins generated 490 sequence similar, paralogous groups. Groups ranged in size from 92 to 2 members and had varying degrees of relatedness among their members. Some E. coli enzyme groups were compared to homologs in other bacterial genomes. CONCLUSION: The deleterious effects of multimodular proteins on annotation and on the formation of groups of paralogs are emphasized. To improve annotation results, all multimodular proteins in an organism should be detected and when known each function should be connected with its location in the sequence of the protein. When transferring functions by sequence similarity, alignment locations must be noted, particularly when alignments cover only part of the sequences, in order to enable transfer of the correct function. Separating multimodular proteins into module units makes it possible to generate protein groups related by both sequence and function, avoiding mixing of unrelated sequences. Organisms differ in sizes of groups of sequence-related proteins. A sample comparison of orthologs to selected E. coli paralogous groups correlates with known physiological and taxonomic relationships between the organisms

    MorphDB : prioritizing genes for specialized metabolism pathways and gene ontology categories in plants

    Get PDF
    Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest

    Assignment of new roles for malectin-like domains to understand their divergent evolution

    Get PDF
    Malectin is a highly-conserved animal lectin from the endoplasmic reticulum (ER), with a quality control function in the N-Glycosylation process. It has a β-sandwich core with long loops connecting the β-sheets. Malectin binding-pocket is in the loops region. Several carbohydrate-binding modules (CBMs) discovered in other domains of life that shared sequence homology with the malectin, were classified and grouped as a novel CBM57 family by Carbohydrate-Active Enzymes (CAZy) database. The members of this family are expected to have a highly conserved β-sandwich core, but high variance in the binding-pocket residues. To investigate if the specificity of these modules is the same as the malectin, a bioinformatic analysis was performed with 315 members of the CBM57 family found in CAZy database. Several programs were used to predict the protein architecture and to analyse the conservation of amino acids sequences, especially in the binding-pocket. Based on this analysis, we predict animal CBM57 modules to have the same specificity as malectin. However, bacterial CBM57 modules in bacteria domain are predicted, after highlighting the modules associated with glycoside hydrolases from family 2, to have various specificities, and thus different biological functions. For verifying these assumptions, a total of 7 CBMs (family 57 and homologous) associated with glycoside hydrolases from family 2 and belonging to the human gut microbiome – Bacteroides ovatus and Bacteroides thetaiotaomicron- were chosen for characterization studies. A re-cloning was initially performed for the recombinant DNAs, changing the His-tag position. Afterwards, expression tests were realized, in which 2 CBMs of different bacteria were expressed in soluble form. The production of the proteins was then performed at a larger scale, followed by affinity chromatography purification. By the analysis of the gels, the eluted samples had high purity and were suitable for characterization studies. Glycan microarrays were performed for determining the binding-specificities of the 2 CBM modules. The CBM module from B.thetaiotaomicron revealed high specificity for pectin polysaccharides, possible recognizing α 1-3 linked galacturonic acid and ramnose. For structural characterization by X-ray crystallography, several crystallization trials were performed. Crystals were obtained for the B.thetaiotaomicron CBM module, which diffracted to high resolution. The structure is, yet, to be solved

    Comparative genomic analysis reveals independent expansion of a lineage-specific gene family in vertebrates: The class II cytokine receptors and their ligands in mammals and fish

    Get PDF
    BACKGROUND: The high degree of sequence conservation between coding regions in fish and mammals can be exploited to identify genes in mammalian genomes by comparison with the sequence of similar genes in fish. Conversely, experimentally characterized mammalian genes may be used to annotate fish genomes. However, gene families that escape this principle include the rapidly diverging cytokines that regulate the immune system, and their receptors. A classic example is the class II helical cytokines (HCII) including type I, type II and lambda interferons, IL10 related cytokines (IL10, IL19, IL20, IL22, IL24 and IL26) and their receptors (HCRII). Despite the report of a near complete pufferfish (Takifugu rubripes) genome sequence, these genes remain undescribed in fish. RESULTS: We have used an original strategy based both on conserved amino acid sequence and gene structure to identify HCII and HCRII in the genome of another pufferfish, Tetraodon nigroviridis that is amenable to laboratory experiments. The 15 genes that were identified are highly divergent and include a single interferon molecule, three IL10 related cytokines and their potential receptors together with two Tissue Factor (TF). Some of these genes form tandem clusters on the Tetraodon genome. Their expression pattern was determined in different tissues. Most importantly, Tetraodon interferon was identified and we show that the recombinant protein can induce antiviral MX gene expression in Tetraodon primary kidney cells. Similar results were obtained in Zebrafish which has 7 MX genes. CONCLUSION: We propose a scheme for the evolution of HCII and their receptors during the radiation of bony vertebrates and suggest that the diversification that played an important role in the fine-tuning of the ancestral mechanism for host defense against infections probably followed different pathways in amniotes and fish

    Identification of novel regulatory factor X (RFX) target genes by comparative genomics in Drosophila species

    Get PDF
    An RFX-binding site is shown to be conserved in the promoters of a subset of ciliary genes and a subsequent screen for this site in two Drosophila species identified novel RFX target genes that are involved in sensory ciliogenesis

    Evolutionarily Conserved Transcriptional Co-Expression Guiding Embryonic Stem Cell Differentiation

    Get PDF
    Understanding the molecular mechanisms controlling pluripotency in embryonic stem cells (ESCs) is of central importance towards realizing their potentials in medicine and science. Cross-species examination of transcriptional co-expression allows elucidation of fundamental and species-specific mechanisms regulating ESC self-renewal or differentiation.We examined transcriptional co-expression of ESCs from pathways to global networks under the framework of human-mouse comparisons. Using generalized singular value decomposition and comparative partition around medoids algorithms, evolutionarily conserved and divergent transcriptional co-expression regulating pluripotency were identified from ESC-critical pathways including ACTIVIN/NODAL, ATK/PTEN, BMP, CELL CYCLE, JAK/STAT, PI3K, TGFbeta and WNT. A set of transcription factors, including FOX, GATA, MYB, NANOG, OCT, PAX, SOX and STAT, and the FGF response element were identified that represent key regulators underlying the transcriptional co-expression. By transcriptional intervention conducted in silico, dynamic behavior of pathways was examined, which demonstrate how much and in which specific ways each gene or gene combination effects the behavior transition of a pathway in response to ESC differentiation or pluripotency induction. The global co-expression networks of ESCs were dominated by highly connected hub genes such as IGF2, JARID2, LCK, MYCN, NASP, OCT4, ORC1L, PHC1 and RUVBL1, which are possibly critical in determining the fate of ESCs.Through these studies, evolutionary conservation at genomic, transcriptomic, and network levels is shown to be an effective predictor of molecular factors and mechanisms controlling ESC development. Various hypotheses regarding mechanisms controlling ESC development were generated, which could be further validated by in vitro experiments. Our findings shed light on the systems-level understanding of how ESC differentiation or pluripotency arises from the connectivity or networks of genes, and provide a "road-map" for further experimental investigation

    Molecular Evolution Within Protease Family C2, or Calpains

    Get PDF

    Phylogeographic diversity and mosaicism of the Helicobacter pylori tfs integrative and conjugative elements.

    Get PDF
    Background: The genome of the gastric pathogen Helicobacter pylori is characterised by considerable variation of both gene sequence and content, much of which is contained within three large genomic islands comprising the cag pathogenicity island (cagPAI) and two mobile integrative and conjugative elements (ICEs) termed tfs3 and tfs4. All three islands are implicated as virulence factors, although whereas the cagPAI is well characterised, understanding of how the tfs elements influence H. pylori interactions with different human hosts is significantly confounded by limited definition of their distribution, diversity and structural representation in the global H. pylori population. Results: To gain a global perspective of tfs ICE population dynamics we established a bioinformatics workflow to extract and precisely define the full tfs pan-gene content contained within a global collection of 221 draft and complete H. pylori genome sequences. Complete (ca. 35-55kbp) and remnant tfs ICE clusters were reconstructed from a dataset comprising >12,000 genes, from which orthologous gene complements and distinct alleles descriptive of different tfs ICE types were defined and classified in comparative analyses. The genetic variation within defined ICE modular segments was subsequently used to provide a complete description of tfs ICE diversity and a comprehensive assessment of their phylogeographic context. Our further examination of the apparent ICE modular types identified an ancient and complex history of ICE residence, mobility and interaction within particular H. pylori phylogeographic lineages and further, provided evidence of both contemporary inter-lineage and inter-species ICE transfer and displacement. Conclusions: Our collective results establish a clear view of tfs ICE diversity and phylogeographic representation in the global H. pylori population, and provide a robust contextual framework for elucidating the functional role of the tfs ICEs particularly as it relates to the risk of gastric disease associated with different tfs ICE genotypes

    A Tangled Web: Origins of Reproductive Parasitism

    Get PDF
    While typically a flea parasite and opportunistic human pathogen, the presence of Rickettsia felis (strain LSU-Lb) in the non-blood- feeding, parthenogenetically reproducing booklouse, Liposcelis bostrychophila, provides a system to ascertain factors governing not only host transitions but also obligate reproductive parasitism (RP). Analysis of plasmid pLbAR, unique to R. felis str. LSU-Lb, revealed a toxin–antitoxin module with similar features to prophage-encoded toxin–antitoxin modules utilized by parasitic Wolbachia strains to induce another form of RP, cytoplasmic incompatibility, in their arthropod hosts. Curiously, multiple deubiquitinase and nuclease domains of the large (3,841 aa) pLbAR toxin, as well the entire antitoxin, facilitated the detection of an assortment of related proteins from diverse intracellular bacteria, including other reproductive parasites. Our description of these remarkable components of the intracellular mobilome, including their presence in certain arthropod genomes, lends insight on the evolution of RP, while invigo- rating research on parasite-mediated biocontrol of arthropod-borne viral and bacterial pathogens
    • …
    corecore