10,008 research outputs found
Recommended from our members
A literature search tool for intelligent extraction of disease-associated genes
Objective: To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. Methods: We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. Results: We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder–gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. Conclusions: We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene–disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately
Text-mining and information-retrieval services for molecular biology
Text-mining in molecular biology - defined as the automatic extraction of information about genes, proteins and their functional relationships from text documents - has emerged as a hybrid discipline on the edges of the fields of information science, bioinformatics and computational linguistics. A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators
Using Neural Networks for Relation Extraction from Biomedical Literature
Using different sources of information to support automated extracting of
relations between biomedical concepts contributes to the development of our
understanding of biological systems. The primary comprehensive source of these
relations is biomedical literature. Several relation extraction approaches have
been proposed to identify relations between concepts in biomedical literature,
namely, using neural networks algorithms. The use of multichannel architectures
composed of multiple data representations, as in deep neural networks, is
leading to state-of-the-art results. The right combination of data
representations can eventually lead us to even higher evaluation scores in
relation extraction tasks. Thus, biomedical ontologies play a fundamental role
by providing semantic and ancestry information about an entity. The
incorporation of biomedical ontologies has already been proved to enhance
previous state-of-the-art results.Comment: Artificial Neural Networks book (Springer) - Chapter 1
A Query Integrator and Manager for the Query Web
We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions
A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory
Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm
A Semantic Framework Supporting Multilayer Networks Analysis for Rare Diseases
Understanding the role played by genetic variations in diseases, exploring genomic variants, and discovering disease-associated loci are among the most pressing challenges of genomic medicine. A huge and ever-increasing amount of information is available to researchers to address these challenges. Unfortunately, it is stored in fragmented ontologies and databases, which use heterogeneous formats and poorly integrated schemas. To overcome these limitations, the authors propose a linked data approach, based on the formalism of multilayer networks, able to integrate and harmonize biomedical information from multiple sources into a single dense network covering different aspects on Neuroendocrine Neoplasms (NENs). The proposed integration schema consists of three interconnected layers representing, respectively, information on the disease, on the affected genes, on the related biological processes and molecular functions. An easy-to-use client-server application was also developed to browse and search for information on the model supporting multilayer network analysis
Collaborative text-annotation resource for disease-centered relation extraction from biomedical text
Agglomerating results from studies of individual biological components has shown the potential to produce
biomedical discovery and the promise of therapeutic development. Such knowledge integration
could be tremendously facilitated by automated text mining for relation extraction in the biomedical
literature. Relation extraction systems cannot be developed without substantial datasets annotated
with ground truth for benchmarking and training. The creation of such datasets is hampered by the
absence of a resource for launching a distributed annotation effort, as well as by the lack of a standardized
annotation schema. We have developed an annotation schema and an annotation tool which can
be widely adopted so that the resulting annotated corpora from a multitude of disease studies could be
assembled into a unified benchmark dataset. The contribution of this paper is threefold. First, we provide
an overview of available benchmark corpora and derive a simple annotation schema for specific
binary relation extraction problems such as protein–protein and gene–disease relation extraction.
Second, we present BioNotate: an open source annotation resource for the distributed creation of a
large corpus. Third, we present and make available the results of a pilot annotation effort of the autism
disease networkP08-TIC-4299 of J. A.,
Sevilla and TIN2006-13177 of DGICT, MadridMilton foundationNational Science Foundation under Grant No. 054348
Thinking PubMed: an innovative system for mental health domain
Information regarding mental illness is dispersed over various resources but even within a specific resource, such as PubMed, it is difficult to link this information, to share it and find specific information when needed. Specific and targeted searches are very difficult with current search engines as they look for the specific string of letters within the text rather than its meaning.In this paper we present Thinking PubMed as a system that results from synergy of ontology and data mining technologies and performs intelligent information searches using the domain ontology. Furthermore, the Thinking PubMed analyzes and links the retrieved information, and extracts hidden patterns and knowledge using data mining algorithms. This is a new generation of information-seeking tool where the ontology and data-mining work in concert to increase the value of the available information
- …