Search CORE

988 research outputs found

The human phylome: a large scale phylogenetic study on the human genome evolution

Author: Huerta Cepas Jaime
Publication venue
Publication date: 01/01/2008
Field of study

Tesis doctoral inédita. Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 07-11-200

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score

Author: Akaike
Altenhoff
Altschul
Byrne
Capella-Gutierrez
Chen
Datta
Edgar
Elsik
Fitch
Gabaldón
Gabaldón
Guindon
Huerta-Cepas
Huerta-Cepas
Huerta-Cepas
Jaime Huerta-Cepas
Kuzniar
Leszek P. Pryszcz
Marcet-Houben
Muller
Ruan
Tatusov
Toni Gabaldón
Vilella
Wallace
Wapinski
Wapinski
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Reliable prediction of orthology is central to comparative genomics. Approaches based on phylogenetic analyses closely resemble the original definition of orthology and paralogy and are known to be highly accurate. However, the large computational cost associated to these analyses is a limiting factor that often prevents its use at genomic scales. Recently, several projects have addressed the reconstruction of large collections of high-quality phylogenetic trees from which orthology and paralogy relationships can be inferred. This provides us with the opportunity to infer the evolutionary relationships of genes from multiple, independent, phylogenetic trees. Using such strategy, we combine phylogenetic information derived from different databases, to predict orthology and paralogy relationships for 4.1 million proteins in 829 fully sequenced genomes. We show that the number of independent sources from which a prediction is made, as well as the level of consistency across predictions, can be used as reliable confidence scores. A webserver has been developed to easily access these data (http://orthology.phylomedb.org), which provides users with a global repository of phylogeny-based orthology and paralogy predictions

Crossref

PubMed Central

UPF Digital Repository

ETE: a python Environment for Tree Exploration

Author: Dopazo Joaquín
Gabaldón Toni
Huerta-Cepas Jaime
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Results Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. Conclusions ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from <url>http://ete.cgenomics.org</url>.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

PhylomeDB: a database for genome-wide collections of gene phylogenies

Author: A. Bueno
Birney
Comas
Duret
Edgar
Gabaldon
Gascuel
Guindon
Huerta-Cepas
J. Dopazo
J. Huerta-Cepas
Leebens-Mack
Li
Ronquist
Sicheritz-Ponten
Smith
T. Gabaldon
Publication venue: Oxford University Press
Publication date
Field of study

The complete collection of evolutionary histories of all genes in a genome, also known as phylome, constitutes a valuable source of information. The reconstruction of phylomes has been previously prevented by large demands of time and computer power, but is now feasible thanks to recent developments in computers and algorithms. To provide a publicly available repository of complete phylomes that allows researchers to access and store large-scale phylogenomic analyses, we have developed PhylomeDB. PhylomeDB is a database of complete phylomes derived for different genomes within a specific taxonomic range. All phylomes in the database are built using a high-quality phylogenetic pipeline that includes evolutionary model testing and alignment trimming phases. For each genome, PhylomeDB provides the alignments, phylogentic trees and tree-based orthology predictions for every single encoded protein. The current version of PhylomeDB includes the phylomes of Human, the yeast Saccharomyces cerevisiae and the bacterium Escherichia coli, comprising a total of 32 289 seed sequences with their corresponding alignments and 172 324 phylogenetic trees. PhylomeDB can be publicly accessed at http://phylomedb.bioinfo.cipf.e

Crossref

PubMed Central

PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions

Author: D. Kormes
Delsuc
Diez
Edgar
Eisen
Eisen
Gabaldon
Gabaldon
Huerta-Cepas
Huerta-Cepas
I. Denisov
J. Huerta-Cepas
Katoh
Keane
Kuzniar
L. P. Pryszcz
M. Marcet-Houben
Marcet-Houben
Pena
S. Capella-Gutierrez
Sicheritz-Ponten
Smith
T. Gabaldon
Wallace
Publication venue: Oxford University Press
Publication date
Field of study

The growing availability of complete genomic sequences from diverse species has brought about the need to scale up phylogenomic analyses, including the reconstruction of large collections of phylogenetic trees. Here, we present the third version of PhylomeDB (http://phylomeDB.org), a public database for genome-wide collections of gene phylogenies (phylomes). Currently, PhylomeDB is the largest phylogenetic repository and hosts 17 phylomes, comprising 416 093 trees and 165 840 alignments. It is also a major source for phylogeny-based orthology and paralogy predictions, covering about 5 million proteins in 717 fully-sequenced genomes. For each protein-coding gene in a seed genome, the database provides original and processed alignments, phylogenetic trees derived from various methods and phylogeny-based predictions of orthology and paralogy relationships. The new version of phylomeDB has been extended with novel data access and visualization features, including the possibility of programmatic access. Available seed species include model organisms such as human, yeast, Escherichia coli or Arabidopsis thaliana, but also alternative model species such as the human pathogen Candida albicans, or the pea aphid Acyrtosiphon pisum. Finally, PhylomeDB is currently being used by several genome sequencing projects that couple the genome annotation process with the reconstruction of the corresponding phylome, a strategy that provides relevant evolutionary insights

Crossref

PubMed Central

The Newick utilities: high-throughput phylogenetic tree processing in the Unix shell

Author: E. M. Zdobnov
Edgar
Huerta-Cepas
Paradis
Rice
Salminen
Stajich
T. Junier
Tapparel
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Summary: We present a suite of Unix shell programs for processing any number of phylogenetic trees of any size. They perform frequently-used tree operations without requiring user interaction. They also allow tree drawing as scalable vector graphics (SVG), suitable for high-quality presentations and further editing, and as ASCII graphics for command-line inspection. As an example we include an implementation of bootscanning, a procedure for finding recombination breakpoints in viral genomes

CiteSeerX

Crossref

PubMed Central

Archive ouverte UNIGE

NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language

Author: Alves R.
Bork P.
Coelho L.P.
Freitas A.T.
Huerta-Cepas J.
Monteiro P.
Publication venue: BioMed Central
Publication date: 01/01/2019
Field of study

Background: Shotgun metagenomes contain a sample of all the genomic material in an environment, allowing for the characterization of a microbial community. In order to understand these communities, bioinformatics methods are crucial. A common first step in processing metagenomes is to compute abundance estimates of different taxonomic or functional groups from the raw sequencing data. Given the breadth of the field, computational solutions need to be flexible and extensible, enabling the combination of different tools into a larger pipeline. Results: We present NGLess and NG-meta-profiler. NGLess is a domain specific language for describing next-generation sequence processing pipelines. It was developed with the goal of enabling user-friendly computational reproducibility. It provides built-in support for many common operations on sequencing data and is extensible with external tools with configuration files. Using this framework, we developed NG-meta-profiler, a fast profiler for metagenomes which performs sequence preprocessing, mapping to bundled databases, filtering of the mapping results, and profiling (taxonomic and functional). It is significantly faster than either MOCAT2 or htseq-count and (as it builds on NGLess) its results are perfectly reproducible. Conclusions: NG-meta-profiler is a high-performance solution for metagenomics processing built on NGLess. It can be used as-is to execute standard analyses or serve as the starting point for customization in a perfectly reproducible fashion. NGLess and NG-meta-profiler are open source software (under the liberal MIT license) and can be downloaded from https://ngless.embl.de or installed through bioconda

Heidelberger Dokumentenserver

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

MDC Repository

Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper

Author: Bork P.
Forslund K.
Huerta-Cepas J.
Jensen L.J.
Szklarczyk D.
von Mering C.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 22/09/2016
Field of study

Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines relatively inaccessible, less precise homology-based functional transfer is still the default for (meta-)genome annotation. We therefore developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from eggNOG. To validate our method, we benchmarked Gene Ontology predictions against two widely used homology-based approaches: BLAST and InterProScan. Compared to BLAST, eggNOG-mapper reduced by 7% the rate of false positive assignments, and increased by 19% the ratio of curated terms recovered over all terms assigned per protein. Compared to InterProScan, eggNOG-mapper achieved similar proteome coverage and precision, while predicting on average 32 more terms per protein and increasing by 26% the rate of curated terms recovered over total term assignments per protein. Through strict orthology assignments, eggNOG-mapper further renders more specific annotations than possible from domain similarity only (e.g. predicting gene family names). eggNOG-mapper runs ~15x than BLAST and at least 2.5x faster than InterProScan. The tool is available standalone or as an online service at http://eggnog-mapper.embl.de

MDC Repository

proGenomes: a resource for consistent functional and taxonomic annotations of prokaryotic genomes

Author: Bork P.
Forslund K.
Huerta-Cepas J.
Letunic I.
Li S.S.
Mende D.R.
Sunagawa S.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/10/2016
Field of study

The availability of microbial genomes has opened many new avenues of research within microbiology. This has been driven primarily by comparative genomics approaches, which rely on accurate and consistent characterization of genomic sequences. It is nevertheless difficult to obtain consistent taxonomic and integrated functional annotations for defined prokaryotic clades. Thus, we developed proGenomes, a resource that provides user-friendly access to currently 25 038 high-quality genomes whose sequences and consistent annotations can be retrieved individually or by taxonomic clade. These genomes are assigned to 5306 consistent and accurate taxonomic species clusters based on previously established methodology. proGenomes also contains functional information for almost 80 million protein-coding genes, including a comprehensive set of general annotations and more focused annotations for carbohydrate-active enzymes and antibiotic resistance genes. Additionally, broad habitat information is provided for many genomes. All genomes and associated information can be downloaded by user-selected clade or multiple habitat-specific sets of representative genomes. We expect that the availability of high-quality genomes with comprehensive functional annotations will promote advances in clinical microbial genomics, functional evolution and other subfields of microbiology. proGenomes is available at http://progenomes.embl.de

Repository for Publications and Research Data

PubMed Central

UNSWorks

MDC Repository

Online-Publikations-Server der Universität Würzburg

Phylemon: a suite of web tools for molecular evolution, phylogenetics and phylogenomics

Author: Arbiza Leonardo
Dopazo Hernán
Dopazo Joaquín
Gabaldón Toni
Huerta-Cepas Jaime
Medina Ignacio
Tárraga Joaquín
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

Phylemon is an online platform for phylogenetic and evolutionary analyses of molecular sequence data. It has been developed as a web server that integrates a suite of different tools selected among the most popular stand-alone programs in phylogenetic and evolutionary analysis. It has been conceived as a natural response to the increasing demand of data analysis of many experimental scientists wishing to add a molecular evolution and phylogenetics insight into their research. Tools included in Phylemon cover a wide yet selected range of programs: from the most basic for multiple sequence alignment to elaborate statistical methods of phylogenetic reconstruction including methods for evolutionary rates analyses and molecular adaptation. Phylemon has several features that differentiates it from other resources: (i) It offers an integrated environment that enables the direct concatenation of evolutionary analyses, the storage of results and handles required data format conversions, (ii) Once an outfile is produced, Phylemon suggests the next possible analyses, thus guiding the user and facilitating the integration of multi-step analyses, and (iii) users can define and save complete pipelines for specific phylogenetic analysis to be automatically used on many genes in subsequent sessions or multiple genes in a single session (phylogenomics). The Phylemon web server is available at http://phylemon.bioinfo.cipf.es

CiteSeerX

Crossref

PubMed Central