Search CORE

42 research outputs found

Matching curated genome databases: a non trivial task

Author: Barba Matthieu
Descorps-Declère Stéphane
Labedan Bernard
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Curated databases of completely sequenced genomes have been designed independently at the NCBI (RefSeq) and EBI (Genome Reviews) to cope with non-standard annotation found in the version of the sequenced genome that has been published by databanks GenBank/EMBL/DDBJ. These curation attempts were expected to review the annotations and to improve their pertinence when using them to annotate newly released genome sequences by homology to previously annotated genomes. However, we observed that such an uncoordinated effort has two unwanted consequences. First, it is not trivial to map the protein identifiers of the same sequence in both databases. Secondly, the two reannotated versions of the same genome differ at the level of their structural annotation. Results Here, we propose CorBank, a program devised to provide cross-referencing protein identifiers no matter what the level of identity is found between their matching sequences. Approximately 98% of the 1,983,258 amino acid sequences are matching, allowing instantaneous retrieval of their respective cross-references. CorBank further allows detecting any differences between the independently curated versions of the same genome. We found that the RefSeq and Genome Reviews versions are perfectly matching for only 50 of the 641 complete genomes we have analyzed. In all other cases there are differences occurring at the level of the coding sequence (CDS), and/or in the total number of CDS in the respective version of the same genome. CorBank is freely accessible at <url>http://www.corbank.u-psud.fr</url>. The CorBank site contains also updated publication of the exhaustive results obtained by comparing RefSeq and Genome Reviews versions of each genome. Accordingly, this web site allows easy search of cross-references between RefSeq, Genome Reviews, and UniProt, for either a single CDS or a whole replicon. Conclusion CorBank is very efficient in rapid detection of the numerous differences existing between RefSeq and Genome Reviews versions of the same curated genome. Although such differences are acceptable as reflecting different views, we suggest that curators of both genome databases could help reducing further divergence by agreeing on a minimal dialogue and attempting to publish the point of view of the other database whenever it is technically possible.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Homology modeling of complex structural RNAs

Author: Barba Matthieu
Denise Alain
Ponty Yann
Rinaudo Philippe
Wang Wei
Publication venue: HAL CCSD
Publication date: 01/06/2016
Field of study

National audienceAligning macromolecules such as proteins, DNAs and RNAs in order to reveal, or conversely exploit, their functional homology is a classic challenge in bioinformatics, with farreaching applications in structure modelling and genome annotations. In the specific context of complex RNAs, featuring pseudoknots, multiple interactions and noncanonical base pairs, multiple algorithmic solutions and tools have been proposed for the structure/sequence alignment problem. However, such tools are seldom used in practice, due in part to their extreme computational demands, and because of their inability to support general types of structures. Recently, a general parameterized algorithm based on tree decomposition of the query structure has been designed by Rinaudo et al. We present an implementation of the algorithm within a tool named LiCoRNA. We compare it against stateoftheart algorithms. We show that it both gracefully specializes into a practical algorithm for simple classes pseudoknot, and offers a general solution for complex pseudoknots, which are explicitly outofreach of competing softwares

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-CEA

HAL-Polytechnique

HAL-Rennes 1

VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center in 2023.

Author: Alvarez-Jarreta Jorge
Amos Beatrice
Aurrecoechea Cristina
Bah Saikou
Barba Matthieu
Barreto Ana
Basenko Evelina Y
Belnap Robert
Blevins Ann
Brestelli John
Brown Stuart
Böhme Ulrike
Callan Danielle
Campbell Lahcen I
Christophides George K
Crouch Kathryn
Davison Helen R
DeBarry Jeremy D
Demko Richard
Doherty Ryan
Duan Yikun
Dundore Walter
Dyer Sarah
Falke Dave
Fischer Steve
Gajria Bindu
Galdi Daniel
Giraldo-Calderón Gloria I
Harb Omar S
Harper Elizabeth
Helb Danica
Howington Connor
Hu Sufen
Humphrey Jay
Iodice John
Jones Andrew
Judkins John
Kelly Sarah A
Kissinger Jessica C
Kittur Nupur
Kwon Dae Kun
Lamoureux Kristopher
Li Wei
Lodha Disha
MacCallum Robert M
Maslen Gareth
McDowell Mary Ann
Myers Jeremy
Nural Mustafa Veysi
Roos David S
Rund Samuel SC
Shanmugasundram Achchuthan
Sitnik Vasily
Spruill Drew
Starns David
Tomko Sheena Shah
Wang Haiming
Warrenfeltz Susanne
Wieck Robert
Wilkinson Paul A
Zheng Jie
Publication venue: Oxford University Press (OUP)
Publication date: 11/11/2023
Field of study

The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) is a Bioinformatics Resource Center funded by the National Institutes of Health with additional funding from the Wellcome Trust. VEuPathDB supports >600 organisms that comprise invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Since 2004, VEuPathDB has analyzed omics data from the public domain using contemporary bioinformatic workflows, including orthology predictions via OrthoMCL, and integrated the analysis results with analysis tools, visualizations, and advanced search capabilities. The unique data mining platform coupled with >3000 pre-analyzed data sets facilitates the exploration of pertinent omics data in support of hypothesis driven research. Comparisons are easily made across data sets, data types and organisms. A Galaxy workspace offers the opportunity for the analysis of private large-scale datasets and for porting to VEuPathDB for comparisons with integrated data. The MapVEu tool provides a platform for exploration of spatially resolved data such as vector surveillance and insecticide resistance monitoring. To address the growing body of omics data and advances in laboratory techniques, VEuPathDB has added several new data types, searches and features, improved the Galaxy workspace environment, redesigned the MapVEu interface and updated the infrastructure to accommodate these changes

University of Liverpool Repository

Enlighten

VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center

Author: Amos Beatrice
Aurrecoechea Cristina
Barba Matthieu
Barreto Ana
Basenko Evelina Y
Bazant Wojciech
Belnap Robert
Blevins Ann S
Bohme Ulrike
Brestelli John
Brunk Brian P
Caddick Mark
Callan Danielle
Campbell Lahcen
Christensen Mikkel B
Christophides George K
Crouch Kathryn
Davis Kristina
DeBarry Jeremy
Doherty Ryan
Duan Yikun
Dunn Michael
Falke Dave
Fisher Steve
Flicek Paul
Fox Brett
Gajria Bindu
Giraldo-Calderon Gloria I
Harb Omar S
Harper Elizabeth
Hertz-Fowler Christiane
Hickman Mark J
Howington Connor
Hu Sufen
Humphrey Jay
Iodice John
Jones Andrew
Judkins John
Kelly Sarah A
Kissinger Jessica C
Kwon Dae Kun
Lamoureux Kristopher
Lawson Daniel
Li Wei
Lies Kallie
Lodha Disha
Long Jamie
MacCallum Robert M
Maslen Gareth
McDowell Mary Ann
Nabrzyski Jaroslaw
Roos David S
Rund Samuel SC
Schulman Stephanie Wever
Shanmugasundram Achchuthan
Sitnik Vasily
Spruill Drew
Starns David
Stoeckert Christian J
Tomko Sheena Shah
Wang Haiming
Warrenfeltz Susanne
Wieck Robert
Wilkinson Paul A
Xu Lin
Zheng Jie
Publication venue: 'Oxford University Press (OUP)'
Publication date: 28/10/2021
Field of study

The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) represents the 2019 merger of VectorBase with the EuPathDB projects. As a Bioinformatics Resource Center funded by the National Institutes of Health, with additional support from the Welllcome Trust, VEuPathDB supports >500 organisms comprising invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Designed to empower researchers with access to Omics data and bioinformatic analyses, VEuPathDB projects integrate >1700 pre-analysed datasets (and associated metadata) with advanced search capabilities, visualizations, and analysis tools in a graphic interface. Diverse data types are analysed with standardized workflows including an in-house OrthoMCL algorithm for predicting orthology. Comparisons are easily made across datasets, data types and organisms in this unique data mining platform. A new site-wide search facilitates access for both experienced and novice users. Upgraded infrastructure and workflows support numerous updates to the web interface, tools, searches and strategies, and Galaxy workspace where users can privately analyse their own data. Forthcoming upgrades include cloud-ready application architecture, expanded support for the Galaxy workspace, tools for interrogating host-pathogen interactions, and improved interactions with affiliated databases (ClinEpiDB, MicrobiomeDB) and other scientific resources, and increased interoperability with the Bacterial & Viral BRC

University of Liverpool Repository

PubMed Central

Enlighten

Modules réactionnels : un nouveau concept pour étudier l'évolution des voies métaboliques

Author: Barba Matthieu
Publication venue: HAL CCSD
Publication date: 16/12/2011
Field of study

I designed a methodology to annotate enzyme superfamilies, explain their history and describe them in the context of metabolic pathways evolution. Three superfamilies were studied: (1) cyclic amidohydrolases, including DHOases (dihydroorotases, third step of the pyrimidines biosynthesis), for which I proposed a new classification. The phylogenetic tree also includes dihydropyrimidinases (DHPases) and allantoinases (ALNases) which catalyze similar reactions in other pathways (pyrimidine and purine degradation, respectively). (2) The DHODases superfamily (after DHOases) show a similar phylogeny as DHOases, including enzymes from other pathways, DHPDases in particular (after DHPases). This led to the concept of reaction module, i.e. a conserved series of similar reactions in different metabolic pathways. This was used to study (3) the carbamoyltransferases (TCases) which include ATCases (before DHOases). I first isolated a new kind of TCase, potentially involved in the purine degradation, and I proposed a new role for it in the light of reaction modules (linked with ALNase). In those three superfamilies I also found three groups of unidentified paralogs that were remarkably part of the same genetic context called “Yge” which would be a reaction module part of an unidentified pathway. The concept of reactions modules may then reflect the ancestral metabolic pathways for which they would be basic elements.J'ai mis au point une méthodologie pour annoter les superfamilles d'enzymes, en décrire l'histoire et les replacer dans l'évolution de leurs voies métaboliques. J'en ai étudié trois : (1) les amidohydrolases cycliques, dont les DHOases (dihydroorotases, biosynthèse des pyrimidines), pour lesquelles j'ai proposé une nouvelle classification. L'arbre phylogénétique inclut les dihydropyrimidinases (DHPases) et allantoïnases (ALNases) qui ont des réactions similaires dans d'autres voies (dégradation des pyrimidines et des purines respectivement). (2) L'étude de la superfamille des DHODases (qui suivent les DHOases) montre une phylogénie semblable aux DHOases, avec également des enzymes d'autres voies, dont les DHPDases (qui suivent les DHPases). De cette observation est né le concept de module réactionnel, qui correspond à la conservation de l’enchaînement de réactions semblables dans différentes voies métaboliques. Cela a été utilisé lors de (3) l'étude des carbamoyltransférases (TCases) qui incluent les ATCases (précédant les DHOases). J'ai d'abord montré l'existence d'une nouvelle TCase potentiellement impliquée dans la dégradation des purines et lui ai proposé un nouveau rôle en utilisant le concept de module réactionnel (enchaînement avec l'ALNase). Dans ces trois grandes familles j'ai aussi mis en évidence trois groupes de paralogues non identifiés qui se retrouvent pourtant dans un même contexte génétique appelé « Yge » et qui formeraient donc un module réactionnel constitutif d'une nouvelle voie hypothétique. Appliqué à diverses voies, le concept de modules réactionnels refléterait donc les voies métaboliques ancestrales dont ils seraient les éléments de base

Thèses en Ligne

HAL Descartes

Hal-Diderot

Reaction modules : a new concept to study the evolution of metabolic pathways

Author: Barba Matthieu
Publication venue
Publication date: 16/12/2011
Field of study

J'ai mis au point une méthodologie pour annoter les superfamilles d'enzymes, en décrire l'histoire et les replacer dans l'évolution de leurs voies métaboliques. J'en ai étudié trois : (1) les amidohydrolases cycliques, dont les DHOases (dihydroorotases, biosynthèse des pyrimidines), pour lesquelles j'ai proposé une nouvelle classification. L'arbre phylogénétique inclut les dihydropyrimidinases (DHPases) et allantoïnases (ALNases) qui ont des réactions similaires dans d'autres voies (dégradation des pyrimidines et des purines respectivement). (2) L'étude de la superfamille des DHODases (qui suivent les DHOases) montre une phylogénie semblable aux DHOases, avec également des enzymes d'autres voies, dont les DHPDases (qui suivent les DHPases). De cette observation est né le concept de module réactionnel, qui correspond à la conservation de l’enchaînement de réactions semblables dans différentes voies métaboliques. Cela a été utilisé lors de (3) l'étude des carbamoyltransférases (TCases) qui incluent les ATCases (précédant les DHOases). J'ai d'abord montré l'existence d'une nouvelle TCase potentiellement impliquée dans la dégradation des purines et lui ai proposé un nouveau rôle en utilisant le concept de module réactionnel (enchaînement avec l'ALNase). Dans ces trois grandes familles j'ai aussi mis en évidence trois groupes de paralogues non identifiés qui se retrouvent pourtant dans un même contexte génétique appelé « Yge » et qui formeraient donc un module réactionnel constitutif d'une nouvelle voie hypothétique. Appliqué à diverses voies, le concept de modules réactionnels refléterait donc les voies métaboliques ancestrales dont ils seraient les éléments de base.I designed a methodology to annotate enzyme superfamilies, explain their history and describe them in the context of metabolic pathways evolution. Three superfamilies were studied: (1) cyclic amidohydrolases, including DHOases (dihydroorotases, third step of the pyrimidines biosynthesis), for which I proposed a new classification. The phylogenetic tree also includes dihydropyrimidinases (DHPases) and allantoinases (ALNases) which catalyze similar reactions in other pathways (pyrimidine and purine degradation, respectively). (2) The DHODases superfamily (after DHOases) show a similar phylogeny as DHOases, including enzymes from other pathways, DHPDases in particular (after DHPases). This led to the concept of reaction module, i.e. a conserved series of similar reactions in different metabolic pathways. This was used to study (3) the carbamoyltransferases (TCases) which include ATCases (before DHOases). I first isolated a new kind of TCase, potentially involved in the purine degradation, and I proposed a new role for it in the light of reaction modules (linked with ALNase). In those three superfamilies I also found three groups of unidentified paralogs that were remarkably part of the same genetic context called “Yge” which would be a reaction module part of an unidentified pathway. The concept of reactions modules may then reflect the ancestral metabolic pathways for which they would be basic elements

Theses.fr

Evolution of cyclic amidohydrolases: A highly diversified superfamily

Author: Barba Matthieu
Glansdorff Nicolas
Labedan Bernard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/08/2013
Field of study

Dihydroorotases are universal proteins catalyzing the third step of pyrimidine biosynthesis. These zinc metalloenzymes belong to the superfamily of cyclic amidohydrolases, comprising also other enzymes that are involved in degradation of either purines (allantoinases), pyrimidines (dihydropyrimidinases) or hydantoins (hydantoinases). The evolutionary relationships between these mechanistically related enzymes were estimated after designing a method to build an accurate multiple sequence alignment. The amino acid sequences that have been crystallized were used to build a seed alignment. All the remaining homologues were progressively added by aligning their HMM profiles to the seed HMM profile, allowing to obtain a reliable phylogeny of the superfamily. This helped us to propose a new evolutionary classification of dihydroorotases into three major types, while at the same time disentangling an important part of the history of their complex structure-function relationships. Although differing in their substrate specificity, allantoinases, hydantoinases and dihydropyrimidinases are found to be phylogenetically closer to DHOase Type I than the proximity of the three DHOase types to each other. This suggests that the primordial cyclic amidohydrolase was a multifunctional, highly evolvable generalist, with high conformational diversity allowing for promiscuous activities. Then, successive gene duplications allowed resolving the primordial substrate ambiguity in various substrate specificities. The present-day superfamily of cyclic amidohydrolases is the result of the progressive divergence of these ancestral paralogous copies by descent with modification. © 2013 Springer Science+Business Media New York.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

Crossref

DI-fusion

Identifying reaction modules in metabolic pathways: bioinformatic deduction and experimental validation of a new putative route in purine catabolism

Author: Barba Matthieu
Dutoit Raphaël
Labedan Bernard
Legrain Christianne
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceEnzymes belonging to mechanistically diverse superfamilies often display similar catalytic mechanisms. We previously observed such an association in the case of the cyclic amidohydrolase superfamily whose members play a role in related steps of purine and pyrimidine metabolic pathways. To establish a possible link between enzyme homology and chemical similarity, we investigated further the neighbouring steps in the respective pathways

HAL-CentraleSupelec

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server