Search CORE

17,468 research outputs found

Gene and protein nomenclature in public databases

Author: AA Morgan
AS Schwartz
D Hanisch
D Hanisch
E Adar
E Brill
H Liu
H Liu
H Yu
JT Chang
K Fundel
Katrin Fundel
L Chen
L Hirschman
L Hirschman
M Szugat
M Weeber
O Tuason
Ralf Zimmer
T Ono
V Hatzivassiloglou
Y Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. RESULTS: We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. CONCLUSION: In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries considerably increases size and decreases ambiguity. The entries of the curated synonym dictionary are available for manual querying, editing, and PubMed- or Google-search via the ProThesaurus-wiki. For automated querying via custom software, we offer a web service and an exemplary client application

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Access LMU

Closing the circle : current state and perspectives of circular RNA databases

Author: Vandesompele Jo
Volders Pieter-Jan
Vromman Marieke
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2021
Field of study

Circular RNAs (circRNAs) are covalently closed RNA molecules that have been linked to various diseases, including cancer. However, a precise function and working mechanism are lacking for the larger majority. Following many different experimental and computational approaches to identify circRNAs, multiple circRNA databases were developed as well. Unfortunately, there are several major issues with the current circRNA databases, which substantially hamper progression in the field. First, as the overlap in content is limited, a true reference set of circRNAs is lacking. This results from the low abundance and highly specific expression of circRNAs, and varying sequencing methods, data-analysis pipelines, and circRNA detection tools. A second major issue is the use of ambiguous nomenclature. Thus, redundant or even conflicting names for circRNAs across different databases contribute to the reproducibility crisis. Third, circRNA databases, in essence, rely on the position of the circRNA back-splice junction, whereas alternative splicing could result in circRNAs with different length and sequence. To uniquely identify a circRNA molecule, the full circular sequence is required. Fourth, circRNA databases annotate circRNAs' microRNA binding and protein-coding potential, but these annotations are generally based on presumed circRNA sequences. Finally, several databases are not regularly updated, contain incomplete data or suffer from connectivity issues. In this review, we present a comprehensive overview of the current circRNA databases and their content, features, and usability. In addition to discussing the current issues regarding circRNA databases, we come with important suggestions to streamline further research in this growing field

Ghent University Academic Bibliography

Recommended from our members

The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases.

Author: Alliance of Genome Resources Consortium
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

Model organisms are essential experimental platforms for discovering gene functions, defining protein and genetic networks, uncovering functional consequences of human genome variation, and for modeling human disease. For decades, researchers who use model organisms have relied on Model Organism Databases (MODs) and the Gene Ontology Consortium (GOC) for expertly curated annotations, and for access to integrated genomic and biological information obtained from the scientific literature and public data archives. Through the development and enforcement of data and semantic standards, these genome resources provide rapid access to the collected knowledge of model organisms in human readable and computation-ready formats that would otherwise require countless hours for individual researchers to assemble on their own. Since their inception, the MODs for the predominant biomedical model organisms [Mus sp (laboratory mouse), Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio, and Rattus norvegicus] along with the GOC have operated as a network of independent, highly collaborative genome resources. In 2016, these six MODs and the GOC joined forces as the Alliance of Genome Resources (the Alliance). By implementing shared programmatic access methods and data-specific web pages with a unified "look and feel," the Alliance is tackling barriers that have limited the ability of researchers to easily compare common data types and annotations across model organisms. To adapt to the rapidly changing landscape for evaluating and funding core data resources, the Alliance is building a modern, extensible, and operationally efficient "knowledge commons" for model organisms using shared, modular infrastructure

eScholarship - University of California

The Chlamydomonas genome project: A decade on

Author: Aksoy M
Blaby IK
Blaby-Haas CE
Dutcher S
Goodstein D
Grimwood J
Grossman A
Harris EH
Hom EFY
King S
Lopez D
Merchant SS
Porter M
Prochnik S
Schmutz J
Stanke M
Tourasse N
Umen J
Vallon O
Witman GB
Publication venue: eScholarship, University of California
Publication date: 01/10/2014
Field of study

The green alga Chlamydomonas reinhardtii is a popular unicellular organism for studying photosynthesis, cilia biogenesis, and micronutrient homeostasis. Ten years since its genome project was initiated an iterative process of improvements to the genome and gene predictions has propelled this organism to the forefront of the omics era. Housed at Phytozome, the plant genomics portal of the Joint Genome Institute (JGI), the most up-to-date genomic data include a genome arranged on chromosomes and high-quality gene models with alternative splice forms supported by an abundance of whole transcriptome sequencing (RNA-Seq) data. We present here the past, present, and future of Chlamydomonas genomics. Specifically, we detail progress on genome assembly and gene model refinement, discuss resources for gene annotations, functional predictions, and locus ID mapping between versions and, importantly, outline a standardized framework for naming genes

PubMed Central

eScholarship - University of California

miRBase Tracker : keeping track of microRNA annotation changes

Author: Anckaert Jasper
Beckers Anneleen
Lefever Steve
Mestdagh Pieter
Ongenaert Maté
Rihani Ali
Van Goethem Alan
Van Peer Gert
Vandesompele Jo
Volders Pieter-Jan
Zeka Fjoralba
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

Since 2002, information on individual microRNAs (miRNAs), such as reference names and sequences, has been stored in miRBase, the reference database for miRNA annota- tion. As a result of progressive insights into the miRNome and its complexity, miRBase underwent addition and deletion of miRNA records, changes in annotated miRNA se- quences and adoption of more complex naming schemes over time. Unfortunately, miRBase does not allow straightforward assessment of these ongoing miRNA annota- tion changes, which has resulted in substantial ambiguity regarding miRNA identity and sequence in public literature, in target prediction databases and in content on various commercially available analytical platforms. As a result, correct interpretation, compari- son and integration of miRNA study results are compromised, which we demonstrate here by assessing the impact of ignoring sequence annotation changes. To address this problem, we developed miRBase Tracker (www.mirbasetracker.org), an easy-to-use on- line database that keeps track of all historical and current miRNA annotation present in the miRBase database. Three basic functionalities allow researchers to keep their miRNA annotation up-to-date, reannotate analytical miRNA platforms and link published results with outdated annotation to the latest miRBase release. We expect miRBase Tracker to increase the transparency and annotation accuracy in the field of miRNA research. Database URL: www.mirbasetracker.or

Crossref

Ghent University Academic Bibliography

PubMed Central

EcoCyc: fusing model organism databases with systems biology.

Author: Bonavides-Martínez César
Collado-Vides Julio
Fulcher Carol
Gama-Castro Socorro
Gunsalus Robert P
Huerta Araceli M
Karp Peter D
Keseler Ingrid M
Kothari Anamika
Krummenacker Markus
Latendresse Mario
Mackie Amanda
Muñiz-Rascado Luis
Ong Quang
Paley Suzanne
Paulsen Ian
Peralta-Gil Martin
Santos-Zavaleta Alberto
Schröder Imke
Shearer Alexander G
Subhraveti Pallavi
Travers Mike
Weerasinghe Deepika
Weiss Verena
Publication venue: eScholarship, University of California
Publication date: 07/11/2012
Field of study

EcoCyc (http://EcoCyc.org) is a model organism database built on the genome sequence of Escherichia coli K-12 MG1655. Expert manual curation of the functions of individual E. coli gene products in EcoCyc has been based on information found in the experimental literature for E. coli K-12-derived strains. Updates to EcoCyc content continue to improve the comprehensive picture of E. coli biology. The utility of EcoCyc is enhanced by new tools available on the EcoCyc web site, and the development of EcoCyc as a teaching tool is increasing the impact of the knowledge collected in EcoCyc

PubMed Central

eScholarship - University of California

Macquarie University ResearchOnline

Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi

Author: Abarenkov K
Aime MC
Ariyawansa HA
Bidartondo M
Boekhout T
Buyck B
Cai Q
Cardinali G
Chen J
Crespo A
Crous PW
Damm U
De Beer ZW
Dentinger BTM
Dieguez Uribeondo J
Divakar PK
Duenas M
Duong V
Feau N
Federhen S
Fliegerova K
Garcia MA
Ge Z-W
Griffith G
Groenewald JZ
Groenewald M
Grube M
Gryzenhout M
Gueidan C
Guo L
Hambleton S
Hamelin R
Hansen K
Hofstetter V
Hong S-B
Houbraken J
Hughes K
Hyde KD
Inderbitzin P
Irinyi L
Johnston PR
Karunarathna SC
Kirk PM
Koljalg U
Kovacs GM
Kraichak E
Krizsan K
Kurtzman CP
Larsson K-H
Leavitt S
Letcher PM
Liimatainen K
Liu J-K
Lodge DJ
Luangsa-ard JJ
Lumbsch HT
Maharachchikumbura SSN
Manamgoda D
Martin MP
Meyer W
Miller AN
Minnis AM
Moncalvo J-M
Mule G
Nakasone KK
Nilsson RH
Niskanen T
Olariaga I
Papp T
Petkovits T
Pino-Bodas R
Powell MJ
Raja HA
Redecker D
Robbertse B
Robert V
Sarmiento-Ramirez JM
Schoch CL
Seifert KA
Shrestha B
Stenroos S
Stielow B
Subbarao KV
Suh S-O
Tanaka K
Tedersoo L
Teresa Telleria M
Udayanga D
Untereiner WA
Vagvoelgyi C
Visagie C
Voigt K
Walker DM
Weir BS
Weiss M
Wijayawardene NN
Wingfield MJ
Xu JP
Yang ZL
Zhang N
Zhuang W-Y
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

DNA phylogenetic comparisons have shown that morphology-based species recognition often underestimates fungal diversity. Therefore, the need for accurate DNA sequence data, tied to both correct taxonomic names and clearly annotated specimen data, has never been greater. Furthermore, the growing number of molecular ecology and microbiome projects using high-throughput sequencing require fast and effective methods for en masse species assignments. In this article, we focus on selecting and re-annotating a set of marker reference sequences that represent each currently accepted order of Fungi. The particular focus is on sequences from the internal transcribed spacer region in the nuclear ribosomal cistron, derived from type specimens and/or ex-type cultures. Re-annotated and verified sequences were deposited in a curated public database at the National Center for Biotechnology Information (NCBI), namely the RefSeq Targeted Loci (RTL) database, and will be visible during routine sequence similarity searches with NR_prefixed accession numbers. A set of standards and protocols is proposed to improve the data quality of new sequences, and we suggest how type and other reference sequences can be used to improve identification of Fungi

Shared Research Repository

Wageningen University & Research Publications

Spiral - Imperial College Digital Repository

IUPHAR-DB: An Expert-Curated, Peer-Reviewed Database of Receptors and Ion Channels

Author: Anthony J. Harmar
Edward M. Rosser
Joanna L. Sharman
Martin Jones
NC - IUPHAR
Rebecca Hills
Stuart D. Greenhill
Valerie A. Hale
Publication venue
Publication date: 07/05/2009
Field of study

The International Union of Basic and Clinical Pharmacology database (IUPHAR-DB) integrates peer-reviewed pharmacological, chemical, genetic, functional and anatomical information on the 354 non-sensory G protein-coupled receptors (GPCRs), 71 ligand-gated ion channel subunits and 141 voltage-gated ion channel subunits encoded by the human, rat and mouse genomes. These genes represent the targets of about a third of currently approved drugs and are a major focus of drug discovery and development programs in the pharmaceutical industry. Individual gene pages provide a comprehensive description of the genes and their functions, with information on protein structure, ligands, expression patterns, signaling mechanisms, functional assays and biologically important receptor variants (e.g. single nucleotide polymorphisms and splice variants). The phenotypes resulting from altered gene expression (e.g. in genetically altered animals) and genetic mutations are described. Links are provided to bioinformatics resources such as NCBI RefSeq, OMIM, PubChem, human, rat and mouse genome databases. Recent developments include the addition of ligand-centered pages summarising information about unique ligand molecules in IUPHAR-DB. IUPHAR-DB represents a novel approach to biocuration because most data are provided through manual curation of published literature by a network of over 60 expert subcommittees coordinated by NC-IUPHAR. Data are referenced to the primary literature and linked to PubMed. The data are checked to ensure accuracy and consistency by the curators, added to the production server using custom-built submission tools and peer-reviewed by NC-IUPHAR, before being transferred to the public database. Data are reviewed and updated regularly (at least biennially). Other website features include comprehensive database search tools, online and downloadable gene lists and links to recent publications of interest to the field, such as reports on receptor-ligand pairings. The database is freely available at "http://www.iuphar-db.org":http://www.iuphar-db.org. Curators can be reached at curators [at] iuphar-db.org. We thank British Pharmacological Society, UNESCO (through the ICSU Grants Programme), Incyte, GlaxoSmithKline, Novartis, Servier and Wyeth for their support

Nature Precedings