Search CORE

885 research outputs found

Annotating genes and genomes with DNA sequences extracted from biomedical articles

Author: Aerts
Anderson
Benson
Casey M. Bergman
Cock
Colosimo
Dowell
Fulp
Garcia-Remesal
Garcia-Remesal
Gerner
Gibson
Gray
Hakenberg
Holley
Hubbard
Karolchik
Kent
Kersey
Krallinger
Maglott
Martin Gerner
Maximilian Haeussler
Morgan
Rhead
Roberts
Semon
Shtatland
The FlyBase Consortium
Vandesompele
Visel
Weiss
Wren
Yoshida
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Motivation: Increasing rates of publication and DNA sequencing make the problem of finding relevant articles for a particular gene or genomic region more challenging than ever. Existing text-mining approaches focus on finding gene names or identifiers in English text. These are often not unique and do not identify the exact genomic location of a study

CiteSeerX

Crossref

PubMed Central

The University of Manchester - Institutional Repository

BBP: Brucella genome annotation with literature mining and curation

Author: He Yongqun
Xiang Zuoshuang
Zheng Wenjie
Publication venue: BioMed Central
Publication date: 01/07/2006
Field of study

BACKGROUND: Brucella species are Gram-negative, facultative intracellular bacteria that cause brucellosis in humans and animals. Sequences of four Brucella genomes have been published, and various Brucella gene and genome data and analysis resources exist. A web gateway to integrate these resources will greatly facilitate Brucella research. Brucella genome data in current databases is largely derived from computational analysis without experimental validation typically found in peer-reviewed publications. It is partially due to the lack of a literature mining and curation system able to efficiently incorporate the large amount of literature data into genome annotation. It is further hypothesized that literature-based Brucella gene annotation would increase understanding of complicated Brucella pathogenesis mechanisms. RESULTS: The Brucella Bioinformatics Portal (BBP) is developed to integrate existing Brucella genome data and analysis tools with literature mining and curation. The BBP InterBru database and Brucella Genome Browser allow users to search and analyze genes of 4 currently available Brucella genomes and link to more than 20 existing databases and analysis programs. Brucella literature publications in PubMed are extracted and can be searched by a TextPresso-powered natural language processing method, a MeSH browser, a keywords search, and an automatic literature update service. To efficiently annotate Brucella genes using the large amount of literature publications, a literature mining and curation system coined Limix is developed to integrate computational literature mining methods with a PubSearch-powered manual curation and management system. The Limix system is used to quickly find and confirm 107 Brucella gene mutations including 75 genes shown to be essential for Brucella virulence. The 75 genes are further clustered using COG. In addition, 62 Brucella genetic interactions are extracted from literature publications. These results make possible more comprehensive investigation of Brucella pathogenesis. Other BBP features include publication email alert service, Brucella researchers' contact database, and discussion forum. CONCLUSION: BBP is a gateway for Brucella researchers to search, analyze, and curate Brucella genome data originated from public databases and literature. Brucella gene mutations and genetic interactions are annotated using Limix leading to better understanding of Brucella pathogenesis

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Deep Blue Documents at the University of Michigan

Text-mining assisted regulatory annotation

Author: Aerts Stein
Bergman Casey M.
Griffith Obi L.
Haeussler Maximilian
Haussler Maximilian
Hulpiau Paco
Jones Steven J M
Montgomery Stephen B.
van Vooren Steven
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Text-mining technologies can be integrated with genome annotation systems, increasing the availability of annotated cis-regulatory data

Lirias

Crossref

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

The University of Manchester - Institutional Repository

ProdInra

Translational web robots for pathogen genome analysis

Author: A Kahvejian
AC McHardy
C Hyland
D Parks
G Mariscal
J Shon
JW Huss
M Haeussler
OG Pybus
PS Dehal
SM Leach
T Davidsen
T Oinn
V Sintchenko
V Sintchenko
VM Markowitz
Y Kano
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

4 page(s

Crossref

Springer - Publisher Connector

PubMed Central

Macquarie University ResearchOnline

No wisdom in the crowd: genome annotation at the time of big data - current status and future prospects

Author: Danchin Antoine
Publication venue
Publication date: 01/01/2018
Field of study

Science and engineering rely on the accumulation and dissemination of knowledge to make discoveries and create new designs. Discovery-driven genome research rests on knowledge passed on via gene annotations. In response to the deluge of sequencing big data, standard annotation practice employs automated procedures that rely on majority rules. We argue this hinders progress through the generation and propagation of errors, leading investigators into blind alleys. More subtly, this inductive process discourages the discovery of novelty, which remains essential in biological research and reflects the nature of biology itself. Annotation systems, rather than being repositories of facts, should be tools that support multiple modes of inference. By combining deduction, induction and abduction, investigators can generate hypotheses when accurate knowledge is extracted from model databases. A key stance is to depart from ‘the sequence tells the structure tells the function’ fallacy, placing function first. We illustrate our approach with examples of critical or unexpected pathways, using MicroScope to demonstrate how tools can be implemented following the principles we advocate. We end with a challenge to the reader

PhilPapers

Kino: A Generic Document Management System for Biologists Using SA-REST and Faceted Search

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

DES-mutation : system for exploring links of mutations and diseases

Author: AlSaieedi Ahdab
Bajic Vladimir B
Bin Raies Arwa
Bokhari Ameerah
Essack Magbubah
Kordopati Vasiliki
Li Yu
Radovanovic Aleksandar
Razali Rozaimi
Salhi Adil
Tifratene Faroug
Uludag Mahmut
Van Neste Christophe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

During cellular division DNA replicates and this process is the basis for passing genetic information to the next generation. However, the DNA copy process sometimes produces a copy that is not perfect, that is, one with mutations. The collection of all such mutations in the DNA copy of an organism makes it unique and determines the organism's phenotype. However, mutations are often the cause of diseases. Thus, it is useful to have the capability to explore links between mutations and disease. We approached this problem by analyzing a vast amount of published information linking mutations to disease states. Based on such information, we developed the DES-Mutation knowledgebase which allows for exploration of not only mutation-disease links, but also links between mutations and concepts from 27 topic-specific dictionaries such as human genes/proteins, toxins, pathogens, etc. This allows for a more detailed insight into mutation-disease links and context. On a sample of 600 mutation-disease associations predicted and curated, our system achieves precision of 72.83%. To demonstrate the utility of DES-Mutation, we provide case studies related to known or potentially novel information involving disease mutations. To our knowledge, this is the first mutation-disease knowledgebase dedicated to the exploration of this topic through text-mining and data-mining of different mutation types and their associations with terms from multiple thematic dictionaries

Ghent University Academic Bibliography

Directory of Open Access Journals

Archivsystem Ask23

A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE)

Author: Chrichton Daniel J.
Mazumder Raja
Pan Yang
Shamsaddini Amirhossein
Simonyan Vahan
Smith Krista
Wu Tsung-Jung
Publication venue: Health Sciences Research Commons
Publication date: 01/01/2014
Field of study

Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies

PubMed Central

George Washington University: Health Sciences Research Commons (HSRC)

Semantic-enabled Hybrid Genetic Disease Diagnostics in Next-Generation Sequenced Data

Author: Wołk Krzysztof
Zawadzka-Gosk Emilia
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 22/05/2018
Field of study

Next Generation Sequencing is a technology for genome sequencing used in genetics for diseased diagnosis. NGS provides the list of all mutations in a genome, so identifying the one which causes a disease is not trivial. A number of applications for variant prioritization was developed, but the data they provide is rather a suggestion than a diagnosis, moreover they suffer from issues as identifying nonpathogenic variant as a causal one or inability to identify the causal gene. These issues inspired us to create a strategy for variant prioritization which includes the use of Exomiser and OmimExplorer result sets improved by semantic analysis of abstracts and articles freely available from PubMed and PubMed Central databases. For the wider scope of scientific articles Google Scholar repository will be used. Described approach enables to present latest and most accurate information about potential pathogenic variants

Computer Science Journal (AGH University of Science and Technology, Krakow)