1,154 research outputs found
GreenPhylDB: A Gene Family Database for plant functional Genomics
With the increasing number of genomes being sequenced, a major objective is to transfer accurate annotation from characterised proteins to uncharacterised sequences. Consequently, comparative genomics has become a usual and efficient strategy in functional genomics. The release of various annotated genomes of plants, such as _O. sativa_ and _A. thaliana_, has allowed setting up comprehensive lists of gene families defined by automated methods. However, like for gene sequence, manual curation of gene families is an important requirement that has to be undertaken. GreenPhylDB comprises protein sequences of 12 plant species fully sequenced that were grouped into homeomorphic families using similarity-based methods. Clusters are finally processed by phylogenetic analysis to infer orthologs and paralogs that will be particularly helpful to study genome evolution. Previously, each cluster has to be curated (i.e. properly named and classified) using different sources of information. A web interface for plant gene families’ curation was developed for that purpose. This interface, accessible on GreenPhylDB ("http://greenphyl.cirad.fr":http://greenphyl.cirad.fr), centralizes external references (e.g. InterPro, KEGG, Swiss-Prot, PIRSF, Pubmed) related to all gene members of the clusters and shows statistics and automatic analysis. We believe that this synthetic view of data available for a gene cluster, combined with basic guidelines, is an efficient way to provide reliable method for gene family annotations
Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants
<p>Abstract</p> <p>Background</p> <p>Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations.</p> <p>Results</p> <p>We developed a procedure for ortholog prediction between <it>Oryza sativa </it>and <it>Arabidopsis thaliana</it>. Firstly, we established an efficient method to cluster <it>A. thaliana </it>and <it>O. sativa </it>full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions.</p> <p>Conclusion</p> <p>Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods.</p
A look at trails through the pangenome visualization jungle
High-throughput sequencing technologies enabled the production of multiple reference genome sequences for a single species. Comparisons of such sequences showed that there are structural variations between individuals from the same species such as Copy Number Variations (CNV) and Presence Absence Variations (PAV) that can have a significant impact on phenotypic variation in plants and could be suitable for breeding improved crop varieties. Thus, a single reference genome is insufficient to capture all variations.
Pangenomics is an integrative approach which aims to the assessment of such genomic variations and more within a group of closely related individuals. Its definition can focus on the whole repertoire of genes within a group or can include blocks of genomic sequences shared between species. We introduce here a new visualization tool, based on a linear representation: the PANgenome Analyzer with CHromosomal Exploration (PANACHE). It is a web-based application which enables its users to explore a pangenomic reference divided in multiple panchromosomes
GreenPhylDB v5: a comparative pangenomic database for plant genomes
Comparative genomics is the analysis of genomic relationships among different species and serves as a significant base for evolutionary and functional genomic studies. GreenPhylDB (https://www.greenphyl.org) is a database designed to facilitate the exploration of gene families and homologous relationships among plant genomes, including staple crops critically important for global food security. GreenPhylDB is available since 2007, after the release of the Arabidopsis thaliana and Oryza sativa genomes and has undergone multiple releases. With the number of plant genomes currently available, it becomes challenging to select a single reference for comparative genomics studies but there is still a lack of databases taking advantage several genomes by species for orthology detection. GreenPhylDBv5 introduces the concept of comparative pangenomics by harnessing multiple genome sequences by species. We created 19 pangenes and processed them with other species still relying on one genome. In total, 46 plant species were considered to build gene families and predict their homologous relationships through phylogenetic-based analyses. In addition, since the previous publication, we rejuvenated the website and included a new set of original tools including protein-domain combination, tree topologies searches and a section for users to store their own results in order to support community curation efforts
Ten simple rules for developing visualization tools in genomics
Our following 10 simple rules are dedicated to biologists and bioinformaticians who, while already being at the crossroads of many fields, want to venture further into the land of Data Visualization (“datavis” or “dataviz” for short). They combine tips and advice that we would have wanted when we first started our own journeys, gathered from our experiences in building genomic and/or datavis tools, and the time spent with related communities. Additionally, they address current challenges in computational biology and the needs of the community
GreenPhylDB v2.0: comparative and functional genomics in plants
GreenPhylDB is a database designed for comparative and functional genomics based on complete genomes. Version 2 now contains sixteen full genomes of members of the plantae kingdom, ranging from algae to angiosperms, automatically clustered into gene families. Gene families are manually annotated and then analyzed phylogenetically in order to elucidate orthologous and paralogous relationships. The database offers various lists of gene families including plant, phylum and species specific gene families. For each gene cluster or gene family, easy access to gene composition, protein domains, publications, external links and orthologous gene predictions is provided. Web interfaces have been further developed to improve the navigation through information related to gene families. New analysis tools are also available, such as a gene family ontology browser that facilitates exploration. GreenPhylDB is a component of the South Green Bioinformatics Platform (http://southgreen.cirad.fr/) and is accessible at http://greenphyl.cirad.fr. It enables comparative genomics in a broad taxonomy context to enhance the understanding of evolutionary processes and thus tends to speed up gene discovery
The Generation Challenge Programme comparative plant stress-responsive gene catalogue
The Generation Challenge Programme (GCP; www.generationcp.org) has developed an online resource documenting stress-responsive genes comparatively across plant species. This public resource is a compendium of protein families, phylogenetic trees, multiple sequence alignments (MSA) and associated experimental evidence. The central objective of this resource is to elucidate orthologous and paralogous relationships between plant genes that may be involved in response to environmental stress, mainly abiotic stresses such as water deficit (‘drought’). The web-based graphical user interface (GUI) of the resource includes query and visualization tools that allow diverse searches and browsing of the underlying project database. The web interface can be accessed at http://dayhoff.generationcp.org
- …