Search CORE

353 research outputs found

Sequence Similarity Network Reveals Common Ancestry of Multidomain Proteins

We address the problem of homology identification in complex multidomain families with varied domain architectures. The challenge is to distinguish sequence pairs that share common ancestry from pairs that share an inserted domain but are otherwise unrelated. This distinction is essential for accuracy in gene annotation, function prediction, and comparative genomics. There are two major obstacles to multidomain homology identification: lack of a formal definition and lack of curated benchmarks for evaluating the performance of new methods. We offer preliminary solutions to both problems: 1) an extension of the traditional model of homology to include domain insertions; and 2) a manually curated benchmark of well-studied families in mouse and human. We further present Neighborhood Correlation, a novel method that exploits the local structure of the sequence similarity network to identify homologs with great accuracy based on the observation that gene duplication and domain shuffling leave distinct patterns in the sequence similarity network. In a rigorous, empirical comparison using our curated data, Neighborhood Correlation outperforms sequence similarity, alignment length, and domain architecture comparison. Neighborhood Correlation is well suited for automated, genome-scale analyses. It is easy to compute, does not require explicit knowledge of domain architecture, and classifies both single and multidomain homologs with high accuracy. Homolog predictions obtained with our method, as well as our manually curated benchmark and a web-based visualization tool for exploratory analysis of the network neighborhood structure, are available at http://www.neighborhoodcorrelation.org. Our work represents a departure from the prevailing view that the concept of homology cannot be applied to genes that have undergone domain shuffling. In contrast to current approaches that either focus on the homology of individual domains or consider only families with identical domain architectures, we show that homology can be rationally defined for multidomain families with diverse architectures by considering the genomic context of the genes that encode them. Our study demonstrates the utility of mining network structure for evolutionary information, suggesting this is a fertile approach for investigating evolutionary processes in the post-genomic era

Directory of Open Access Journals

Family classification without domain chaining

Author: Bj rklund
Bolten
Crabtree
D. Durand
Demuth
Enright
Fitch
Heger
Heinicke
Huynen
J. M. Joseph
Krause
Paccanaro
Rahmann
Sasson
Song
Song
Tatusov
Wittkop
Wu
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Classification of gene and protein sequences into homologous families, i.e. sets of sequences that share common ancestry, is an essential step in comparative genomic analyses. This is typically achieved by construction of a sequence homology network, followed by clustering to identify dense subgraphs corresponding to families. Accurate classification of single domain families is now within reach due to major algorithmic advances in remote homology detection and graph clustering. However, classification of multidomain families remains a significant challenge. The presence of the same domain in sequences that do not share common ancestry introduces false edges in the homology network that link unrelated families and stymy clustering algorithms

Recommended from our members

Evolutionary and molecular foundations of multiple contemporary functions of the nitroreductase superfamily.

Author: Akiva Eyal
Babbitt Patricia C
Copp Janine N
Tokuriki Nobuhiko
Publication venue: eScholarship, University of California
Publication date: 01/11/2017
Field of study

Insight regarding how diverse enzymatic functions and reactions have evolved from ancestral scaffolds is fundamental to understanding chemical and evolutionary biology, and for the exploitation of enzymes for biotechnology. We undertook an extensive computational analysis using a unique and comprehensive combination of tools that include large-scale phylogenetic reconstruction to determine the sequence, structural, and functional relationships of the functionally diverse flavin mononucleotide-dependent nitroreductase (NTR) superfamily (>24,000 sequences from all domains of life, 54 structures, and >10 enzymatic functions). Our results suggest an evolutionary model in which contemporary subgroups of the superfamily have diverged in a radial manner from a minimal flavin-binding scaffold. We identified the structural design principle for this divergence: Insertions at key positions in the minimal scaffold that, combined with the fixation of key residues, have led to functional specialization. These results will aid future efforts to delineate the emergence of functional diversity in enzyme superfamilies, provide clues for functional inference for superfamily members of unknown function, and facilitate rational redesign of the NTR scaffold

eScholarship - University of California

Protein comparison at the domain architecture level

Author: AK Bjorklund
Byungwook Lee
C Chothia
C Vogel
CP Ponting
Doheon Lee
H Tordai
JH Fong
K Lin
LY Geer
M Balestre
M Punta
MK Basu
MK Basu
N Song
N Song
P Glenisson
S Yu
SF Altschul
V Hollich
Publication venue: BioMed Central
Publication date: 03/12/2009
Field of study

Springer - Publisher Connector

Extensive Gene Remodeling in the Viral World: New Evidence for Nongradual Evolution in the Mobilome Network

Author: Bapteste Eric
Colson Philippe
Jachiet Pierre-Alain
Lopez Philippe
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/08/2014
Field of study

International audienceComplex nongradual evolutionary processes such as gene remodeling are difficult to model, to visualize, and to investigate systematically. Despite these challenges, the creation of composite (or mosaic) genes by combination of genetic segments from unrelated gene families was established as an important adaptive phenomena in eukaryotic genomes. In contrast, almost no general studies have been conducted to quantify composite genes in viruses. Although viral genome mosaicism has been well-described, the extent of gene mosaicism and its rules of emergence remain largely unexplored. Applying methods from graph theory to inclusive similarity networks, and using data from more than 3,000 complete viral genomes, we provide the first demonstration that composite genes in viruses are 1) functionally biased, 2) involved in key aspects of the arm race between cells and viruses, and 3) can be classified into two distinct types of composite genes in all viral classes. Beyond the quantification of the widespread recombination of genes among different viruses of the same class, we also report a striking sharing of genetic information between viruses of different classes and with different nucleic acid types. This latter discovery provides novel evidence for the existence of a large and complex mobilome network, which appears partly bound by the sharing of genetic information and by the formation of composite genes between mobile entities with different genetic material. Considering that there are around 10E31 viruses on the planet, gene remodeling appears as a hugely significant way of generating and moving novel sequences between different kinds of organisms on Earth

HAL AMU

Hal-Diderot

Toward community standards in the quest for orthologs

Author: Altenhoff Adrian
Altenhoff Adrian
Apweiler Rolf
Apweiler Rolf
Ashburner Michael
Ashburner Michael
Blake Judith
Blake Judith
Boeckmann Brigitte
Boeckmann Brigitte
Bridge Alan
Bridge Alan
Bruford Elspeth
Bruford Elspeth
Cherry Mike
Cherry Mike
Conte Matthieu
Conte Matthieu
Dannie Durand
Dannie Durand
Datta Ruchira
Datta Ruchira
Dessimoz Christophe
Dessimoz Christophe
Dessimoz Christophe
Dessimoz Christophe
Domelevo Entfellner Jean-Baka
Domelevo Entfellner Jean-Baka
Ebersberger Ingo
Ebersberger Ingo
Gabaldón Toni
Gabaldón Toni
Gabaldón Toni
Gabaldón Toni
Galperin Michael
Galperin Michael
Herrero Javier
Herrero Javier
Herrero Javier
Herrero Javier
Joseph Jacob
Joseph Jacob
Koestler Tina
Koestler Tina
Kriventseva Evgenia
Kriventseva Evgenia
Lecompte Odile
Lecompte Odile
Leunissen Jack
Leunissen Jack
Lewis Suzanna
Lewis Suzanna
Linard Benjamin
Linard Benjamin
Livstone Michael S.
Livstone Michael S.
Lu Hui-Chun
Lu Hui-Chun
Martin Maria
Martin Maria
Mazumder Raja
Mazumder Raja
Messina David
Messina David
Miele Vincent
Miele Vincent
Muffato Matthieu
Muffato Matthieu
Perrière Guy
Perrière Guy
Punta Marco
Punta Marco
Roos David
Roos David
Roos David S.
Roos David S.
Rouard Mathieu
Rouard Mathieu
Schmitt Thomas
Schmitt Thomas
Schreiber Fabian
Schreiber Fabian
Silva Alan
Silva Alan
Sjölander Kimmen
Sjölander Kimmen
Sonnhammer Erik
Sonnhammer Erik
Sonnhammer Erik L. L.
Sonnhammer Erik L. L.
Stanley Eleanor
Stanley Eleanor
Szklarczyk Radek
Szklarczyk Radek
Thomas Paul
Thomas Paul
Uchiyama Ikuo
Uchiyama Ikuo
Van Bel Michiel
Van Bel Michiel
Vandepoele Klaas
Vandepoele Klaas
Vilella Albert J.
Vilella Albert J.
Yates Andrew
Yates Andrew
Zdobnov Evgeny
Zdobnov Evgeny
Škunca Nives
Škunca Nives
Publication venue
Publication date: 02/08/2017
Field of study

The identification of orthologs—genes pairs descended from a common ancestor through speciation, rather than duplication—has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second ‘Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications. Contact: [email protected]

Toward community standards in the quest for orthologs

Author: Dessimoz C.
Gabaldon T.
Herrero J.
Quest for Ortholog Consortium
Roos D.S.
Sonnhammer E.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 10/06/2014
Field of study

Origin and evolution of the Notch signalling pathway: an overview from eukaryotic genomes

Author: Borchiellini Carole
Brunet Frédéric
Degnan Bernard M
Ereskovsky Alexander V
Gazave Eve
Lapébie Pascal
Renard Emmanuelle
Richards Gemma S
Vervoort Michel
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background. Of the 20 or so signal transduction pathways that orchestrate cell-cell interactions in metazoans, seven are involved during development. One of these is the Notch signalling pathway which regulates cellular identity, proliferation, differentiation and apoptosis via the developmental processes of lateral inhibition and boundary induction. In light of this essential role played in metazoan development, we surveyed a wide range of eukaryotic genomes to determine the origin and evolution of the components and auxiliary factors that compose and modulate this pathway. Results. We searched for 22 components of the Notch pathway in 35 different species that represent 8 major clades of eukaryotes, performed phylogenetic analyses and compared the domain compositions of the two fundamental molecules: the receptor Notch and its ligands Delta/Jagged. We confirm that a Notch pathway, with true receptors and ligands is specific to the Metazoa. This study also sheds light on the deep ancestry of a number of genes involved in this pathway, while other members are revealed to have a more recent origin. The origin of several components can be accounted for by the shuffling of pre-existing protein domains, or via lateral gene transfer. In addition, certain domains have appeared de novo more recently, and can be considered metazoan synapomorphies. Conclusion. The Notch signalling pathway emerged in Metazoa via a diversity of molecular mechanisms, incorporating both novel and ancient protein domains during eukaryote evolution. Thus, a functional Notch signalling pathway was probably present in Urmetazoa

HAL-ENS-LYON

HAL AMU

Springer - Publisher Connector

HAL-INSU

ProdInra

Hal-Diderot

University of Queensland eSpace

Linking Fold, Function and Phylogeny: A Comparative Genomics View on Protein (Domain) Evolution

Author: Bagowski Christoph P
te Velthuis Aartjan J.W
Publication venue: Bentham Science Publishers Ltd.
Publication date: 01/01/2008
Field of study

Domains are the building blocks of all globular proteins and present one of the most useful levels at which protein function can be understood. Through recombination and duplication of a limited set of domains, proteomes evolved and the collection of protein superfamilies in an organism formed. As such, the presence of a shared domain can be regarded as an indicator of similar function and evolutionary history, but it does not necessarily imply it since convergent evolution may give rise to similar gene functions as well as architectures

Oxford University Research Archive

Evolution at the Subgene Level: Domain Rearrangements in the Drosophila Phylogeny

Author: Kellis Manolis
Rasmussen Matthew D.
Wu Yi-Chieh
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/09/2011
Field of study

Supplementary sections 1–13, tables S1–S10, and figures S1–S9 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ∼9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.National Science Foundation (U.S.) (Graduate Research Fellowship)National Science Foundation (U.S.) (CAREER award NSF 0644282