973 research outputs found

    Interspecies gene function prediction using semantic similarity

    Get PDF

    Combining Homolog and Motif Similarity Data with Gene Ontology Relationships for Protein Function Prediction

    Get PDF
    Uncharacterized proteins pose a challenge not just to functional genomics, but also to biology in general. The knowledge of biochemical functions of such proteins is very critical for designing efficient therapeutic techniques. The bot- tleneck in hypothetical proteins annotation is the difficulty in collecting and aggregating enough biological information about the protein itself. In this paper, we propose and evaluate a protein annotation technique that aggregates different biological infor- mation conserved across many hypothetical proteins. To enhance the performance and to increase the prediction accuracy, we incorporate term specific relationships based on Gene Ontology (GO). Our method combines PPI (Protein Protein Interactions) data, protein motifs information, protein sequence similarity and protein homology data, with a context similarity measure based on Gene Ontology, to accurately infer functional information for unannotated proteins. We apply our method on Saccharomyces Cerevisiae species proteins. The aggregation of different sources of evidence with GO relationships increases the precision and accuracy of prediction compared to other methods reported in literature. We predicted with a precision and accuracy of 100% for more than half proteins of the input set and with an overall 81.35% precision and 80.04% accurac

    A Combined Approach for Genome Wide Protein Function Annotation/Prediction

    Get PDF
    Background Today large scale genome sequencing technologies are uncovering an increasing amount of new genes and proteins, which remain uncharacterized. Experimental procedures for protein function prediction are low throughput by nature and thus can't be used to keep up with the rate at which new proteins are discovered. On the other hand, proteins are the prominent stakeholders in almost all biological processes, and therefore the need to precisely know their functions for a better understanding of the underlying biological mechanism is inevitable. The challenge of annotating uncharacterized proteins in functional genomics and biology in general motivates the use of computational techniques well orchestrated to accurately predict their functions. Methods We propose a computational flow for the functional annotation of a protein able to assign the most probable functions to a protein by aggregating heterogeneous information. Considered information include: protein motifs, protein sequence similarity, and protein homology data gathered from interacting proteins, combined with data from highly similar non-interacting proteins (hereinafter called Similactors). Moreover, to increase the predictive power of our model we also compute and integrate term specific relationships among functional terms based on Gene Ontology (GO). Results We tested our method on Saccharomyces Cerevisiae and Homo sapiens species proteins. The aggregation of different structural and functional evidence with GO relationships outperforms, in terms of precision and accuracy of prediction than the other methods reported in literature. The predicted precision and accuracy is 100% for more than half of the input set for both species; overall, we obtained 85.38% precision and 81.95% accuracy for Homo sapiens and 79.73% precision and 80.06% accuracy for Saccharomyces Cerevisiae species protein

    OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011

    Get PDF
    The concept of homology drives speculation on a gene’s function in any given species when its biological roles in other species are characterized. With reference to a specific species radiation homologous relations define orthologs, i.e. descendants from a single gene of the ancestor. The large-scale delineation of gene genealogies is a challenging task, and the numerous approaches to the problem reflect the importance of the concept of orthology as a cornerstone for comparative studies. Here, we present the updated OrthoDB catalog of eukaryotic orthologs delineated at each radiation of the species phylogeny in an explicitly hierarchical manner of over 100 species of vertebrates, arthropods and fungi (including the metazoa level). New database features include functional annotations, and quantification of evolutionary divergence and relations among orthologous groups. The interface features extended phyletic profile querying and enhanced text-based searches. The ever-increasing sampling of sequenced eukaryotic genomes brings a clearer account of the majority of gene genealogies that will facilitate informed hypotheses of gene function in newly sequenced genomes. Furthermore, uniform analysis across lineages as different as vertebrates, arthropods and fungi with divergence levels varying from several to hundreds of millions of years will provide essential data for uncovering and quantifying long-term trends of gene evolution. OrthoDB is freely accessible from http://cegg.unige.ch/orthodb

    Genome-wide signatures of complex introgression and adaptive evolution in the big cats.

    Get PDF
    The great cats of the genus Panthera comprise a recent radiation whose evolutionary history is poorly understood. Their rapid diversification poses challenges to resolving their phylogeny while offering opportunities to investigate the historical dynamics of adaptive divergence. We report the sequence, de novo assembly, and annotation of the jaguar (Panthera onca) genome, a novel genome sequence for the leopard (Panthera pardus), and comparative analyses encompassing all living Panthera species. Demographic reconstructions indicated that all of these species have experienced variable episodes of population decline during the Pleistocene, ultimately leading to small effective sizes in present-day genomes. We observed pervasive genealogical discordance across Panthera genomes, caused by both incomplete lineage sorting and complex patterns of historical interspecific hybridization. We identified multiple signatures of species-specific positive selection, affecting genes involved in craniofacial and limb development, protein metabolism, hypoxia, reproduction, pigmentation, and sensory perception. There was remarkable concordance in pathways enriched in genomic segments implicated in interspecies introgression and in positive selection, suggesting that these processes were connected. We tested this hypothesis by developing exome capture probes targeting ~19,000 Panthera genes and applying them to 30 wild-caught jaguars. We found at least two genes (DOCK3 and COL4A5, both related to optic nerve development) bearing significant signatures of interspecies introgression and within-species positive selection. These findings indicate that post-speciation admixture has contributed genetic material that facilitated the adaptive evolution of big cat lineages

    Revealing mammalian evolutionary relationships by comparative analysis of gene clusters

    Get PDF
    Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events

    Quorum sensing: An imperative longevity weapon in bacteria

    Get PDF
    Bacterial cells exhibit a complex pattern of co-operative behaviour as shown by their capacity to communicate amongst each other. Quorum sensing (QS) is a generic term used for bacterial cell-to-cell communication which secures survival of its species. Many QS bacteria produce and release autoinducers like acyl-homoserine lactone-signaling molecules to regulate cell population density. Different species of bacteria utilize different QS molecules to regulate its gene expression. A free-living marine bacterium, Vibrio harveyi, uses two QS system to control the density-dependent expression of bioluminescence (lux), commonly classified as sensor and autoinducer system. In Pseudomonas aeruginosa, QS not only controls virulence factor production but also biofilm formation. It is comprised two hierarchically organised systems, each consisting of an autoinducer synthetase (LasI/RhlI) and a corresponding regulator protein (LasR/RhlR). Biofilms produced by Pseudomonas, under control of QS, are ubiquitous in nature and contribute towards colonizations in patients of cystic fibrosis. Other organisms like Haemophilus influenzae and Streptococcus also utilize QS mechanism to control virulence in otitis and endocarditic decay. Overall, QS plays a major role in controlling bacterial economy. It is a simple, practical and effective mechanism of production and control. If the concentration of enzyme is critical, bacteria can sense it and perform a prompt activation or repression of certain target genes for controlling its environment. This review focuses on the QS mechanisms and their role in the survival of few important bacterial species

    Comparative transcriptomics in plants

    Get PDF
    Comparative genomics is the study of the structural and functional rela- tionships between the genomes of different species or strains. Recently microarray experiments have yielded massive amounts of expression infor- mation for many genes under various conditions or in different tissues for different model species. Expression compendia grouping multiple microar- ray experiments performed in similar (or different) experimental condition make it possible to define correlated expression patterns between genes. Genes within such a coexpression cluster are expected to have more similar functionality compared to genes lacking expression similarity. In this thesis the different steps required to systematically compare expres- sion data across species are described and some future applications of plant comparative transcriptomics are highlighted. Then we analyzed if function- ally related genes show coexpression in Arabidopsis and rice and developed a general framework to measure expression context conservation (ECC) for orthologous genes. Additionally, we studied the evolutionary parameters influencing ECC conservation and compared expression with sequence evo- lution. At the end, a new method is presented to define high quality tis- sue specific genes in seven different plant species; A.thaliana (Arabidopsis), Z.mays (Maize), M.truncatula (Medicago), P.trichocarpa (Poplar), O.sativa (Rice), G.max (Soybean) and V.vinifera (Grape) using Affymetrix microar- ray expression profiles. We also performed an in-depth study on the rela- tionship between leaf tissue specific genes coexpression clusters, within a species and in comparison with other species for a set of strictly selected genes
    corecore