3,257 research outputs found

    Ortholog identification in the presence of domain architecture rearrangement

    Get PDF
    Ortholog identification is used in gene functional annotation, species phylogeny estimation, phylogenetic profile construction and many other analyses. Bioinformatics methods for ortholog identification are commonly based on pairwise protein sequence comparisons between whole genomes. Phylogenetic methods of ortholog identification have also been developed; these methods can be applied to protein data sets sharing a common domain architecture or which share a single functional domain but differ outside this region of homology. While promiscuous domains represent a challenge to all orthology prediction methods, overall structural similarity is highly correlated with proximity in a phylogenetic tree, conferring a degree of robustness to phylogenetic methods. In this article, we review the issues involved in orthology prediction when data sets include sequences with structurally heterogeneous domain architectures, with particular attention to automated methods designed for high-throughput application, and present a case study to illustrate the challenges in this area

    Fibronectin Contributes To Notochord Intercalation In The Invertebrate Chordate, Ciona Intestinalis

    Get PDF
    Background: Genomic analysis has upended chordate phylogeny, placing the tunicates as the sister group to the vertebrates. This taxonomic rearrangement raises questions about the emergence of a tunicate/vertebrate ancestor. Results: Characterization of developmental genes uniquely shared by tunicates and vertebrates is one promising approach for deciphering developmental shifts underlying acquisition of novel, ancestral traits. The matrix glycoprotein Fibronectin (FN) has long been considered a vertebrate-specific gene, playing a major instructive role in vertebrate embryonic development. However, the recent computational prediction of an orthologous “vertebrate-like” Fn gene in the genome of a tunicate, Ciona savignyi, challenges this viewpoint suggesting that Fn may have arisen in the shared tunicate/vertebrate ancestor. Here we verify the presence of a tunicate Fn ortholog. Transgenic reporter analysis was used to characterize a Ciona Fn enhancer driving expression in the notochord. Targeted knockdown in the notochord lineage indicates that FN is required for proper convergent extension. Conclusions: These findings suggest that acquisition of Fn was associated with altered notochord morphogenesis in the vertebrate/tunicate ancestor

    Unveiling the structural features of CysE: a novel target for therapeutic interventions against persistent mycobacteria

    Get PDF
    World Health Organization (WHO) reports that one-third of the world’s population is infected with a persistent form of Mycobacterium tuberculosis (M.tb), the causative bacterium responsible for causing the dreaded tuberculosis disease. Targeting mycobacterial persisters is important for achieving WHO’s End TB target. The de-novo cysteine biosynthetic pathway is a novel target for addressing M.tb persistence.  The two-step pathway comprises of serine acetyltransferase/CysE and O-acetyl-serine-sulfhydrylase/OASS/CysK. The present study is an attempt to understand the structural features of mycobacterial CysE by investigating the divergence amongst orthologous through phylogenetic analysis. Mapping of mycobacterial CysE sequences on the whole orthologous (COG1045) tree segregated the species into four clusters and several isoforms leading to their descendants identification. Interestingly the analysis revealed that the extended C-terminal α-helix believed unique to M.tb is also present in other organisms such as: Campylobacter ureolyticus, Bacillus cereus, Geminocystis herdmanii and Paenibacillus borealis. Further, the Hidden Markov model search against the whole Uniprot database suggests a plausible role of C-terminal α-helix of CysE in strengthening the substrate and/or co-factor binding. In addition, phylogenetic analysis of CysE sequences from the Mycobacteriaceae family facilitates grouping them under ten well-formed and six monophyletic clades, each based on characteristic features with respect to domain architecture, oligomeric assembly, C-terminal tetra-peptide tail, regulatory and feedback mechanism etc. Employing molecular phylogeny in conjunction with structural analysis has provided detailed insights for mycobacterial CysEs as drug target

    The bromodomain-containing protein Ibd1 links multiple chromatin related protein complexes to highly expressed genes in Tetrahymena thermophila

    Full text link
    Background: The chromatin remodelers of the SWI/SNF family are critical transcriptional regulators. Recognition of lysine acetylation through a bromodomain (BRD) component is key to SWI/SNF function; in most eukaryotes, this function is attributed to SNF2/Brg1. Results: Using affinity purification coupled to mass spectrometry (AP-MS) we identified members of a SWI/SNF complex (SWI/SNFTt) in Tetrahymena thermophila. SWI/SNFTt is composed of 11 proteins, Snf5Tt, Swi1Tt, Swi3Tt, Snf12Tt, Brg1Tt, two proteins with potential chromatin interacting domains and four proteins without orthologs to SWI/SNF proteins in yeast or mammals. SWI/SNFTt subunits localize exclusively to the transcriptionally active macronucleus (MAC) during growth and development, consistent with a role in transcription. While Tetrahymena Brg1 does not contain a BRD, our AP-MS results identified a BRD-containing SWI/SNFTt component, Ibd1 that associates with SWI/SNFTt during growth but not development. AP-MS analysis of epitope-tagged Ibd1 revealed it to be a subunit of several additional protein complexes, including putative SWRTt, and SAGATt complexes as well as a putative H3K4-specific histone methyl transferase complex. Recombinant Ibd1 recognizes acetyl-lysine marks on histones correlated with active transcription. Consistent with our AP-MS and histone array data suggesting a role in regulation of gene expression, ChIP-Seq analysis of Ibd1 indicated that it primarily binds near promoters and within gene bodies of highly expressed genes during growth. Conclusions: Our results suggest that through recognizing specific histones marks, Ibd1 targets active chromatin regions of highly expressed genes in Tetrahymena where it subsequently might coordinate the recruitment of several chromatin remodeling complexes to regulate the transcriptional landscape of vegetatively growing Tetrahymena cells.Comment: Published on BMC Epigenetics & Chromati

    Lineage-specific expansion of proteins exported to erythrocytes in malaria parasites

    Get PDF
    BACKGROUND: The apicomplexan parasite Plasmodium falciparum causes the most severe form of malaria in humans. After invasion into erythrocytes, asexual parasite stages drastically alter their host cell and export remodeling and virulence proteins. Previously, we have reported identification and functional analysis of a short motif necessary for export of proteins out of the parasite and into the red blood cell. RESULTS: We have developed software for the prediction of exported proteins in the genus Plasmodium, and identified exported proteins conserved between malaria parasites infecting rodents and the two major causes of human malaria, P. falciparum and P. vivax. This conserved 'exportome' is confined to a few subtelomeric chromosomal regions in P. falciparum and the synteny of these and surrounding regions is conserved in P. vivax. We have identified a novel gene family PHIST (for Plasmodium helical interspersed subtelomeric family) that shares a unique domain with 72 paralogs in P. falciparum and 39 in P. vivax; however, there is only one member in each of the three species studied from the P. berghei lineage. CONCLUSION: These data suggest radiation of genes encoding remodeling and virulence factors from a small number of loci in a common Plasmodium ancestor, and imply a closer phylogenetic relationship between the P. vivax and P. falciparum lineages than previously believed. The presence of a conserved 'exportome' in the genus Plasmodium has important implications for our understanding of both common mechanisms and species-specific differences in host-parasite interactions, and may be crucial in developing novel antimalarial drugs to this infectious disease

    Comparative genomics of Burkholderia multivorans, a ubiquitous pathogen with a highly conserved genomic structure

    Get PDF
    The natural environment serves as a reservoir of opportunistic pathogens. A well-established method for studying the epidemiology of such opportunists is multilocus sequence typing, which in many cases has defined strains predisposed to causing infection. Burkholderia multivorans is an important pathogen in people with cystic fibrosis (CF) and its epidemiology suggests that strains are acquired from non-human sources such as the natural environment. This raises the central question of whether the isolation source (CF or environment) or the multilocus sequence type (ST) of B. multivorans better predicts their genomic content and functionality. We identified four pairs of B. multivorans isolates, representing distinct STs and consisting of one CF and one environmental isolate each. All genomes were sequenced using the PacBio SMRT sequencing technology, which resulted in eight high-quality B. multivorans genome assemblies. The present study demonstrated that the genomic structure of the examined B. multivorans STs is highly conserved and that the B. multivorans genomic lineages are defined by their ST. Orthologous protein families were not uniformly distributed among chromosomes, with core orthologs being enriched on the primary chromosome and ST-specific orthologs being enriched on the second and third chromosome. The ST-specific orthologs were enriched in genes involved in defense mechanisms and secondary metabolism, corroborating the strain-specificity of these virulence characteristics. Finally, the same B. multivorans genomic lineages occur in both CF and environmental samples and on different continents, demonstrating their ubiquity and evolutionary persistence

    Big data and other challenges in the quest for orthologs

    Get PDF
    Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third ‘Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking. Availability and implementation: All such materials are available at http://questfororthologs.org. Contact: [email protected] or [email protected]

    Host-Pathogen O-Methyltransferase Similarity and Its Specific Presence in Highly Virulent Strains of Francisella tularensis Suggests Molecular Mimicry

    Get PDF
    Whole genome comparative studies of many bacterial pathogens have shown an overall high similarity of gene content (>95%) between phylogenetically distinct subspecies. In highly clonal species that share the bulk of their genomes subtle changes in gene content and small-scale polymorphisms, especially those that may alter gene expression and protein-protein interactions, are more likely to have a significant effect on the pathogen's biology. In order to better understand molecular attributes that may mediate the adaptation of virulence in infectious bacteria, a comparative study was done to further analyze the evolution of a gene encoding an o-methyltransferase that was previously identified as a candidate virulence factor due to its conservation specifically in highly pathogenic Francisella tularensis subsp. tularensis strains. The o-methyltransferase gene is located in the genomic neighborhood of a known pathogenicity island and predicted site of rearrangement. Distinct o-methyltransferase subtypes are present in different Francisella tularensis subspecies. Related protein families were identified in several host species as well as species of pathogenic bacteria that are otherwise very distant phylogenetically from Francisella, including species of Mycobacterium. A conserved sequence motif profile is present in the mammalian host and pathogen protein sequences, and sites of non-synonymous variation conserved in Francisella subspecies specific o-methyltransferases map proximally to the predicted active site of the orthologous human protein structure. Altogether, evidence suggests a role of the F. t. subsp. tularensis protein in a mechanism of molecular mimicry, similar perhaps to Legionella and Coxiella. These findings therefore provide insights into the evolution of niche-restriction and virulence in Francisella, and have broader implications regarding the molecular mechanisms that mediate host-pathogen relationships

    Big data and other challenges in the quest for orthologs.

    Get PDF
    Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking. AVAILABILITY AND IMPLEMENTATION: All such materials are available at http://questfororthologs.org. CONTACT: [email protected] or [email protected]

    Big Data Supervised Pairwise Ortholog Detection in Yeasts

    Get PDF
    Ortholog are genes in different species, evolving from a common ancestor. Ortholog detection is essential to study phylogenies and to predict the function of unknown genes. The scalability of gene (or protein) pairwise comparisons and that of the classification process constitutes a challenge due to the ever-increasing amount of sequenced genomes. Ortholog detection algorithms, just based on sequence similarity, tend to fail in classification, specifically, in Saccharomycete yeasts with rampant paralogies and gene losses. In this book chapter, a new classification approach has been proposed based on the combination of pairwise similarity measures in a decision system that consider the extreme imbalance between ortholog and non-ortholog pairs. Some new gene pair similarity measures are defined based on protein physicochemical profiles, gene pair membership to conserved regions in related genomes, and protein lengths. The efficiency and scalability of the calculation of these measures are analyzed to propose its implementation for big data. In conclusion, evaluated supervised algorithms that manage big and imbalanced data showed high effectiveness in Saccharomycete yeast genomes
    corecore