13 research outputs found

    Taxonbridge: an R package to create custom taxonomies based on the NCBI and GBIF taxonomies

    Get PDF
    Biological taxonomies establish conventions by which researchers can catalogue and systematically compare their work using nomenclature such as species binomial names and reference identifiers. The ideal taxonomy is unambiguous and exhaustive; however, no such single taxonomy exists, partly due to continuous changes and contributions made to existing taxonomies. The degree to which a taxonomy is useful furthermore depends on context provided by such variables as the taxonomic neighbourhood of a species (e.g., selecting arthropod or vertebrate species) or the geological time frame of the study (e.g., selecting extinct versus extant species). Collating the most relevant taxonomic information from multiple taxonomies is hampered by arbitrarily defined identifiers, ambiguity in scientific names, as well as duplicated and erroneous entries. The goal of taxonbridge is to provide tools for merging the Global Biodiversity Information Facility (GBIF) Backbone Taxonomy and the United States National Center for Biotechnology Information (NCBI) Taxonomy in order to create consistent, deduplicated and disambiguated custom taxonomies that reference both extant and extinct species

    The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases

    Get PDF
    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article

    Bioinformatic analysis of BCL-2 proteins and development of the dedicated knowledge database, BCL2DB

    No full text
    Les protéines BCL-2 jouent un rôle essentiel dans la décision de vie ou de mort des cellules. Elles contrôlent l'induction de l'apoptose (mort cellulaire programmée) par la voie mitochondriale via des fonctions opposées de régulateurs anti- et pro-apoptotiques. Les protéines contenant un ou plusieurs domaines dits d'homologie à Bcl-2 (BHl- 4) sont systématiquement classées dans cette famille. Grâce à une analyse bioinformatique et phylogénétique, nous avons revisité les différents critères d'inclusion dans le groupe de protéines BCL-2 et proposé une nouvelle classification tenant compte des données structurales et évolutives. Cette nouvelle nomenclature distingue : un premier groupe de protéines homologues (dérivant d'un ancêtre commun), partageant une structure 3D semblable à celle de Bcl-2 et pouvant ne posséder aucun motif BH, et un conglomérat, en pleine expansion, regroupant des protéines sans lien phylogénétique apparent et partageant une courte région de similarité de séquence correspondant au motif BH3. Sur la base de ces résultats, nous avons construit un processus, basé sur des profils HMM, pour identifier les protéines appartenant au groupe de protéines BCL-2. Notre processus automatisé est utilisé pour i) récupérer les séquences nucléotidiques et protéiques mensuellement ii) les annoter et iii) les intégrer dans la base de connaissances BCL2DB (« The BCL-2 Database »). Celle-ci est accessible via une interface Web (http://bcl2db.ibcp.fr) qui permet aux chercheurs d'extraire des données et d'effectuer des analyses de séquenceBCL-2 proteins play an essential role in the decision of life or death of animal cells. They control the induction of apoptosis (programmed cell death) in the mitochondrial pathway via regulators having opposite functions: anti- or pro-apoptotic. Proteins containing one or more Bcl-2 homology domains (BHl-4) are systematically classified in this family. Through bioinformatics and phylogenetic analysis, we revisited the different criteria for protein inclusion in the BCL-2 group and proposed a new classification taking into account structural and evolutionary data. This new nomenclature distinguishes a first group of homologous proteins (derived from a common ancestor), sharing a similar 3D structural fold with Bcl-2 and often (but not necessarily) having one or more BH motifs, and a fast expanding conglomerate of proteins without apparent phylogenetic relationships and sharing only a short region of sequence similarity corresponding to the BH3 motif. Based on these results, we built a process based on profiles HMM to identify proteins belonging to the BCL-2 protein group. Our automated process i) recovers on a monthly basis the nucleotide and protein sequences ii) annotates them and iii) integrates this information into BCL2DB ("The BCL-2 Database"). This resource can be accessed via a web interface (http://bcl2db.ibcp.fr) which allows researchers to extract data and perform sequence analysi

    Analyse bioinformatique des protéines BCL-2 et développement de la base de connaissance dédiée, BCL2DB

    No full text
    BCL-2 proteins play an essential role in the decision of life or death of animal cells. They control the induction of apoptosis (programmed cell death) in the mitochondrial pathway via regulators having opposite functions: anti- or pro-apoptotic. Proteins containing one or more Bcl-2 homology domains (BHl-4) are systematically classified in this family. Through bioinformatics and phylogenetic analysis, we revisited the different criteria for protein inclusion in the BCL-2 group and proposed a new classification taking into account structural and evolutionary data. This new nomenclature distinguishes a first group of homologous proteins (derived from a common ancestor), sharing a similar 3D structural fold with Bcl-2 and often (but not necessarily) having one or more BH motifs, and a fast expanding conglomerate of proteins without apparent phylogenetic relationships and sharing only a short region of sequence similarity corresponding to the BH3 motif. Based on these results, we built a process based on profiles HMM to identify proteins belonging to the BCL-2 protein group. Our automated process i) recovers on a monthly basis the nucleotide and protein sequences ii) annotates them and iii) integrates this information into BCL2DB ("The BCL-2 Database"). This resource can be accessed via a web interface (http://bcl2db.ibcp.fr) which allows researchers to extract data and perform sequence analysisLes protéines BCL-2 jouent un rôle essentiel dans la décision de vie ou de mort des cellules. Elles contrôlent l'induction de l'apoptose (mort cellulaire programmée) par la voie mitochondriale via des fonctions opposées de régulateurs anti- et pro-apoptotiques. Les protéines contenant un ou plusieurs domaines dits d'homologie à Bcl-2 (BHl- 4) sont systématiquement classées dans cette famille. Grâce à une analyse bioinformatique et phylogénétique, nous avons revisité les différents critères d'inclusion dans le groupe de protéines BCL-2 et proposé une nouvelle classification tenant compte des données structurales et évolutives. Cette nouvelle nomenclature distingue : un premier groupe de protéines homologues (dérivant d'un ancêtre commun), partageant une structure 3D semblable à celle de Bcl-2 et pouvant ne posséder aucun motif BH, et un conglomérat, en pleine expansion, regroupant des protéines sans lien phylogénétique apparent et partageant une courte région de similarité de séquence correspondant au motif BH3. Sur la base de ces résultats, nous avons construit un processus, basé sur des profils HMM, pour identifier les protéines appartenant au groupe de protéines BCL-2. Notre processus automatisé est utilisé pour i) récupérer les séquences nucléotidiques et protéiques mensuellement ii) les annoter et iii) les intégrer dans la base de connaissances BCL2DB (« The BCL-2 Database »). Celle-ci est accessible via une interface Web (http://bcl2db.ibcp.fr) qui permet aux chercheurs d'extraire des données et d'effectuer des analyses de séquenc

    The Expression Comparison Tool in Bgee

    Get PDF
    We present Expression Comparison, a tool to compare expression patterns between species. It uses curated annotations of homology between anatomical structures, such as organs or tissues. Expression calls are based on the curated transcriptome data integrated within the Bgee database. Gene homology can be of any type, from user input. The results are presented according to conservation of pattern, as well as rank of expression per species. Expression Comparison is freely available on the Bgee website: https://bgee.org

    Triage by ranking to support the curation of protein interactions

    No full text
    Today, molecular biology databases are the cornerstone of knowledge sharing for life and health sciences. The curation and maintenance of these resources are labour intensive. Although text mining is gaining impetus among curators, its integration in curation workflow has not yet been widely adopted. The Swiss Institute of Bioinformatics Text Mining and CALIPHO groups joined forces to design a new curation support system named nextA5. In this report, we explore the integration of novel triage services to support the curation of two types of biological data: protein–protein interactions (PPIs) and post-translational modifications (PTMs). The recognition of PPIs and PTMs poses a special challenge, as it not only requires the identification of biological entities (proteins or residues), but also that of particular relationships (e.g. binding or position). These relationships cannot be described with onto-terminological descriptors such as the Gene Ontology for molecular functions, which makes the triage task more challenging. Prioritizing papers for these tasks thus requires the development of different approaches. In this report, we propose a new method to prioritize articles containing information specific to PPIs and PTMs. The new resources (RESTful APIs, semantically annotated MEDLINE library) enrich the neXtA5 platform. We tuned the article prioritization model on a set of 100 proteins previously annotated by the CALIPHO group. The effectiveness of the triage service was tested with a dataset of 200 annotated proteins. We defined two sets of descriptors to support automatic triage: the first set to enrich for papers with PPI data, and the second for PTMs. All occurrences of these descriptors were marked-up in MEDLINE and indexed, thus constituting a semantically annotated version of MEDLINE. These annotations were then used to estimate the relevance of a particular article with respect to the chosen annotation type. This relevance score was combined with a local vector-space search engine to generate a ranked list of PMIDs. We also evaluated a query refinement strategy, which adds specific keywords (such as ‘binds’ or ‘interacts’) to the original query. Compared to PubMed, the search effectiveness of the nextA5 triage service is improved by 190% for the prioritization of papers with PPIs information and by 260% for papers with PTMs information. Combining advanced retrieval and query refinement strategies with automatically enriched MEDLINE contents is effective to improve triage in complex curation tasks such as the curation of protein PPIs and PTMs

    A new bioinformatics tool to help assess the significance of BRCA1 variants

    No full text
    Abstract Background Germline pathogenic variants in the breast cancer type 1 susceptibility gene BRCA1 are associated with a 60% lifetime risk for breast and ovarian cancer. This overall risk estimate is for all BRCA1 variants; obviously, not all variants confer the same risk of developing a disease. In cancer patients, loss of BRCA1 function in tumor tissue has been associated with an increased sensitivity to platinum agents and to poly-(ADP-ribose) polymerase (PARP) inhibitors. For clinical management of both at-risk individuals and cancer patients, it would be important that each identified genetic variant be associated with clinical significance. Unfortunately for the vast majority of variants, the clinical impact is unknown. The availability of results from studies assessing the impact of variants on protein function may provide insight of crucial importance. Results and conclusion We have collected, curated, and structured the molecular and cellular phenotypic impact of 3654 distinct BRCA1 variants. The data was modeled in triple format, using the variant as a subject, the studied function as the object, and a predicate describing the relation between the two. Each annotation is supported by a fully traceable evidence. The data was captured using standard ontologies to ensure consistency, and enhance searchability and interoperability. We have assessed the extent to which functional defects at the molecular and cellular levels correlate with the clinical interpretation of variants by ClinVar submitters. Approximately 30% of the ClinVar BRCA1 missense variants have some molecular or cellular assay available in the literature. Pathogenic variants (as assigned by ClinVar) have at least some significant functional defect in 94% of testable cases. For benign variants, 77% of ClinVar benign variants, for which neXtProt Cancer variant portal has data, shows either no or mild experimental functional defects. While this does not provide evidence for clinical interpretation of variants, it may provide some guidance for variants of unknown significance, in the absence of more reliable data. The neXtProt Cancer variant portal (https://www.nextprot.org/portals/breast-cancer) contains over 6300 observations at the molecular and/or cellular level for BRCA1 variants

    The neXtProt knowledgebase in 2020: data, tools and usability improvements

    No full text
    The neXtProt knowledgebase (https://www.nextprot.org) is an integrative resource providing both data on human protein and the tools to explore these. In order to provide comprehensive and up-to-date data, we evaluate and add new data sets. We describe the incorporation of three new data sets that provide expression, function, protein-protein binary interaction, post-translational modifications (PTM) and variant information. New SPARQL query examples illustrating uses of the new data were added. neXtProt has continued to develop tools for proteomics. We have improved the peptide uniqueness checker and have implemented a new protein digestion tool. Together, these tools make it possible to determine which proteases can be used to identify trypsin-resistant proteins by mass spectrometry. In terms of usability, we have finished revamping our web interface and completely rewritten our API. Our SPARQL endpoint now supports federated queries. All the neXtProt data are available via our user interface, API, SPARQL endpoint and FTP site, including the new PEFF 1.0 format files. Finally, the data on our FTP site is now CC BY 4.0 to promote its reuse

    The neXtProt knowledgebase on human proteins: 2017 update

    No full text
    The neXtProt human protein knowledgebase (https://www.nextprot.org) continues to add new content and tools, with a focus on proteomics and genetic variation data. neXtProt now has proteomics data for over 85% of the human proteins, as well as new tools tailored to the proteomics community.Moreover, the neXtProt release 2016-08-25 includes over 8000 phenotypic observations for over 4000 variations in a number of genes involved in hereditary cancers and channelopathies. These changes are presented in the current neXtProt update. All of the neXtProt data are available via our user interface and FTP site. We also provide an API access and a SPARQL endpoint for more technical applications
    corecore