Search CORE

4,522,573 research outputs found

Where differences resemble: sequence-feature analysis in curated databases of intrinsically disordered proteins

Author: Apweiler
Bardou
Brown
Damiano Piovesan
Das
Daughdrill
Dinkel
Dunker
Dunker
Dunker
Dyson
Fichó
Fu
Fukuchi
Gunasekaran
Holehouse
Lee
Marco Necci
Miskei
Mészáros
Necci
Necci
Peng
Piovesan
Piovesan
Piovesan
Radivojac
Receveur-Bréchot
Schad
Silvio C E Tosatto
The Gene Ontology Consortium
Tompa
Uversky
Van Roey
Vucetic
Ward
Wootton
Wright
Xue
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Crossref

Archivio istituzionale della ricerca - Università di Padova

FAIR principles and the IEDB: short-term improvements and a long-term vision of OBO-foundry mediated machine-actionable interoperability.

Author: Mungall Christopher J
Overton James A
Peters Bjoern
Sette Alessandro
Vita Randi
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

The Immune Epitope Database (IEDB), at www.iedb.org, has the mission to make published experimental data relating to the recognition of immune epitopes easily available to the scientific public. By presenting curated data in a searchable database, we have liberated it from the tables and figures of journal articles, making it more accessible and usable by immunologists. Recently, the principles of Findability, Accessibility, Interoperability and Reusability have been formulated as goals that data repositories should meet to enhance the usefulness of their data holdings. We here examine how the IEDB complies with these principles and identify broad areas of success, but also areas for improvement. We describe short-term improvements to the IEDB that are being implemented now, as well as a long-term vision of true 'machine-actionable interoperability', which we believe will require community agreement on standardization of knowledge representation that can be built on top of the shared use of ontologies

Crossref

eScholarship - University of California

Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi

Author: Abarenkov K
Aime MC
Ariyawansa HA
Bidartondo M
Boekhout T
Buyck B
Cai Q
Cardinali G
Chen J
Crespo A
Crous PW
Damm U
De Beer ZW
Dentinger BTM
Dieguez Uribeondo J
Divakar PK
Duenas M
Duong V
Feau N
Federhen S
Fliegerova K
Garcia MA
Ge Z-W
Griffith G
Groenewald JZ
Groenewald M
Grube M
Gryzenhout M
Gueidan C
Guo L
Hambleton S
Hamelin R
Hansen K
Hofstetter V
Hong S-B
Houbraken J
Hughes K
Hyde KD
Inderbitzin P
Irinyi L
Johnston PR
Karunarathna SC
Kirk PM
Koljalg U
Kovacs GM
Kraichak E
Krizsan K
Kurtzman CP
Larsson K-H
Leavitt S
Letcher PM
Liimatainen K
Liu J-K
Lodge DJ
Luangsa-ard JJ
Lumbsch HT
Maharachchikumbura SSN
Manamgoda D
Martin MP
Meyer W
Miller AN
Minnis AM
Moncalvo J-M
Mule G
Nakasone KK
Nilsson RH
Niskanen T
Olariaga I
Papp T
Petkovits T
Pino-Bodas R
Powell MJ
Raja HA
Redecker D
Robbertse B
Robert V
Sarmiento-Ramirez JM
Schoch CL
Seifert KA
Shrestha B
Stenroos S
Stielow B
Subbarao KV
Suh S-O
Tanaka K
Tedersoo L
Teresa Telleria M
Udayanga D
Untereiner WA
Vagvoelgyi C
Visagie C
Voigt K
Walker DM
Weir BS
Weiss M
Wijayawardene NN
Wingfield MJ
Xu JP
Yang ZL
Zhang N
Zhuang W-Y
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

DNA phylogenetic comparisons have shown that morphology-based species recognition often underestimates fungal diversity. Therefore, the need for accurate DNA sequence data, tied to both correct taxonomic names and clearly annotated specimen data, has never been greater. Furthermore, the growing number of molecular ecology and microbiome projects using high-throughput sequencing require fast and effective methods for en masse species assignments. In this article, we focus on selecting and re-annotating a set of marker reference sequences that represent each currently accepted order of Fungi. The particular focus is on sequences from the internal transcribed spacer region in the nuclear ribosomal cistron, derived from type specimens and/or ex-type cultures. Re-annotated and verified sequences were deposited in a curated public database at the National Center for Biotechnology Information (NCBI), namely the RefSeq Targeted Loci (RTL) database, and will be visible during routine sequence similarity searches with NR_prefixed accession numbers. A set of standards and protocols is proposed to improve the data quality of new sequences, and we suggest how type and other reference sequences can be used to improve identification of Fungi

Crossref

British Library (BL) Shared Research Repository

Wageningen University & Research Publications

Spiral - Imperial College Digital Repository

Improved ontology for eukaryotic single-exon coding sequences in biological databases

Author: Clausen P.
González C.
Holmes D.S.
Jorquera R.
Petersen B.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Indexación: Scopus.Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term 'single-exon gene'. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term 'SEGs' is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases. © The Author(s) 2018. Published by Oxford University Press.https://academic.oup.com/database/article/doi/10.1093/database/bay089/509943

Repositorio Institucional Académico (RIA) de la Universidad Andrés Bello

Online Research Database In Technology

Southern African Treatment Resistance Network (SATuRN) RegaDB HIV drug resistance and clinical management database: supporting patient management, surveillance and research in southern Africa

Author: Bester Armand
Chetty Terusha
Danaviah Siva
De Oliveira Tulio
Goedhals Dominique
Iwuji Collins
Lessells Richard
Manasa Justen
McGrath Nuala
Moodley Pravi
Naidu Kevindra
Rossouw Theresa
Seebregts Christopher J
Singh Lavanya
Skingsley Andrew
Stott Katharine
Van Vuuren Cloete
Van Zyl Gert
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

Substantial amounts of data have been generated from patient management and academic exercises designed to better understand the human immunodeficiency virus (HIV) epidemic and design interventions to control it. A number of specialized databases have been designed to manage huge data sets from HIV cohort, vaccine, host genomic and drug resistance studies. Besides databases from cohort studies, most of the online databases contain limited curated data and are thus sequence repositories. HIV drug resistance has been shown to have a great potential to derail the progress made thus far through antiretroviral therapy. Thus, a lot of resources have been invested in generating drug resistance data for patient management and surveillance purposes. Unfortunately, most of the data currently available relate to subtype B even though >60% of the epidemic is caused by HIV-1 subtype C. A consortium of clinicians, scientists, public health experts and policy markers working in southern Africa came together and formed a network, the Southern African Treatment and Resistance Network (SATuRN), with the aim of increasing curated HIV-1 subtype C and tuberculosis drug resistance data. This article describes the HIV-1 data curation process using the SATuRN Rega database. The data curation is a manual and time-consuming process done by clinical, laboratory and data curation specialists. Access to the highly curated data sets is through applications that are reviewed by the SATuRN executive committee. Examples of research outputs from the analysis of the curated data include trends in the level of transmitted drug resistance in South Africa, analysis of the levels of acquired resistance among patients failing therapy and factors associated with the absence of genotypic evidence of drug resistance among patients failing therapy. All these studies have been important for informing first- and second-line therapy. This database is a free password-protected open source database available on www.bioafrica.net

Southampton (e-Prints Soton)

Crossref

LSHTM Research Online

Sussex Research Online

UPSpace at the University of Pretoria

Benchmarking database systems for Genomic Selection implementation

Author: Guignon Valentin
Jones Elizabeth
Larmande Pierre
Matthews Dave
Nti-Addae Yaw
Petel Adrien
Renner Jon
Robbins Kelly
Sempere Guilhem
Syed Raza
Ulat Victor Jun
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

Motivation: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems. Results: We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix

HAL-IRD

Agritrop

CGSpace (CGIAR)

Horizon / Pleins textes

HAL Réunion (Univ- de la Réunion )

CIMMYT Publications Repository

Using The Barton Libraries Dataset As An RDF benchmark

Author: Abadi Daniel J.
Hollenbach Kate
Madden Samuel R.
Marcus Adam
Publication venue
Publication date: 06/07/2007
Field of study

This report describes the Barton Libraries RDF dataset and Longwell querybenchmark that we use for our recent VLDB paper on Scalable Semantic WebData Management Using Vertical Partitioning

DSpace@MIT

MisPred: a resource for identification of erroneous protein sequences in public databases

Author: Nagy Alinda
Patthy László
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

Correct prediction of the structure of protein-coding genes of higher eukaryotes is still a difficult task; therefore, public databases are heavily contaminated with mispredicted sequences. The high rate of misprediction has serious consequences because it significantly affects the conclusions that may be drawn from genome-scale sequence analyses of eukaryotic genomes. Here we present the MisPred database and computational pipeline that provide efficient means for the identification of erroneous sequences in public databases. The MisPred database contains a collection of abnormal, incomplete and mispredicted protein sequences from 19 metazoan species identified as erroneous by MisPred quality control tools in the UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, NCBI/RefSeq and EnsEMBL databases. Major releases of the database are automatically generated and updated regularly. The database (http://www.mispred.com) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in a variety of formats

Crossref

PubMed Central

Repository of the Academy's Library

Firebird Database Backup by Serialized Database Table Dump

Author: Ling Maurice HT
Publication venue
Publication date: 13/02/2007
Field of study

This paper presents a simple data dump and load utility for Firebird databases which mimics mysqldump in MySQL. This utility, fb_dump and fb_load, for dumping and loading respectively, retrieves each database table using kinterbasdb and serializes the data using marshal module. This utility has two advantages over the standard Firebird database backup utility, gbak. Firstly, it is able to backup and restore single database tables which might help to recover corrupted databases. Secondly, the output is in text-coded format (from marshal module) making it more resilient than a compressed text backup, as in the case of using gbak.Comment: 5 page

arXiv.org e-Print Archive

University of Melbourne Institutional Repository