723 research outputs found
Using a shallow linguistic kernel for drug-drug interaction extraction
A drug–drug interaction (DDI) occurs when one drug influences the level or activity of another drug. Information Extraction (IE) techniques can provide health care professionals with an interesting way to reduce time spent reviewing the literature for potential drug–drug interactions. Nevertheless, no approach has been proposed to the problem of extracting DDIs in biomedical texts. In this article, we study whether a machine learning-based method is appropriate for DDI extraction in biomedical texts and whether the results provided are superior to those obtained from our previously proposed pattern-based approach [1]. The method proposed here for DDI extraction is based on a supervised machine learning technique, more specifically, the shallow linguistic kernel proposed in Giuliano et al. (2006) [2]. Since no benchmark corpus was available to evaluate our approach to DDI extraction, we created the first such corpus, DrugDDI, annotated with 3169 DDIs. We performed several experiments varying the configuration parameters of the shallow linguistic kernel. The model that maximizes the F-measure was evaluated on the test data of the DrugDDI corpus, achieving a precision of 51.03%, a recall of 72.82% and an F-measure of 60.01%.
To the best of our knowledge, this work has proposed the first full solution for the automatic extraction of DDIs from biomedical texts. Our study confirms that the shallow linguistic kernel outperforms our previous pattern-based approach. Additionally, it is our hope that the DrugDDI corpus will allow researchers to explore new solutions to the DDI extraction problem.This study was funded by the Projects MA2VICMR (S2009/TIC-1542) and MULTIMEDICA (TIN2010-20644-C03-01).Publicad
Heterogeneous biomedical database integration using a hybrid strategy: a p53 cancer research database.
Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.)
Predicting functional associations from metabolism using bi-partite network algorithms
<p>Abstract</p> <p>Background</p> <p>Metabolic reconstructions contain detailed information about metabolic enzymes and their reactants and products. These networks can be used to infer functional associations between metabolic enzymes. Many methods are based on the number of metabolites shared by two enzymes, or the shortest path between two enzymes. Metabolite sharing can miss associations between non-consecutive enzymes in a serial pathway, and shortest-path algorithms are sensitive to high-degree metabolites such as water and ATP that create connections between enzymes with little functional similarity.</p> <p>Results</p> <p>We present new, fast methods to infer functional associations in metabolic networks. A local method, the degree-corrected Poisson score, is based only on the metabolites shared by two enzymes, but uses the known metabolite degree distribution. A global method, based on graph diffusion kernels, predicts associations between enzymes that do not share metabolites. Both methods are robust to high-degree metabolites. They out-perform previous methods in predicting shared Gene Ontology (GO) annotations and in predicting experimentally observed synthetic lethal genetic interactions. Including cellular compartment information improves GO annotation predictions but degrades synthetic lethal interaction prediction. These new methods perform nearly as well as computationally demanding methods based on flux balance analysis.</p> <p>Conclusions</p> <p>We present fast, accurate methods to predict functional associations from metabolic networks. Biological significance is demonstrated by identifying enzymes whose strong metabolic correlations are missed by conventional annotations in GO, most often enzymes involved in transport vs. synthesis of the same metabolite or other enzyme pairs that share a metabolite but are separated by conventional pathway boundaries. More generally, the methods described here may be valuable for analyzing other types of networks with long-tailed degree distributions and high-degree hubs.</p
Biomedical Event Extraction with Machine Learning
Biomedical natural language processing (BioNLP) is a subfield of natural
language processing, an area of computational linguistics concerned
with developing programs that work with natural language: written texts and
speech. Biomedical relation extraction concerns the detection of
semantic relations such as protein--protein interactions (PPI) from scientific
texts. The aim is to enhance information retrieval by detecting relations
between concepts, not just individual concepts as with a keyword search.
In recent years, events have been proposed as a more detailed alternative for
simple pairwise PPI relations. Events provide a systematic, structural
representation for annotating the content of natural language texts. Events are
characterized by annotated trigger words, directed and typed arguments and the
ability to nest other events. For example, the sentence ``Protein A causes
protein B to bind protein C'' can be annotated with the nested event structure
CAUSE(A, BIND(B, C)). Converted to such formal representations, the
information of natural language texts can be used by computational
applications. Biomedical event annotations were introduced by the BioInfer and
GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task
on Event Extraction.
In this thesis we present a method for automated event extraction, implemented
as the Turku Event Extraction System (TEES). A unified graph format is defined
for representing event annotations and the problem of extracting complex event
structures is decomposed into a number of independent classification tasks.
These classification tasks are solved using SVM and RLS classifiers, utilizing
rich feature representations built from full dependency parsing. Building on
earlier work on pairwise relation extraction and using a generalized graph
representation, the resulting TEES system is capable of detecting binary
relations as well as complex event structures.
We show that this event extraction system has good performance,
reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently,
TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared
Tasks, as well as shown competitive performance in the binary relation Drug-Drug
Interaction Extraction 2011 and 2013 shared tasks.
The Turku Event Extraction System is published as a freely available open-source
project, documenting the research in detail as well as making the method
available for practical applications. In particular, in this thesis we
describe the application of the event extraction method to PubMed-scale text
mining, showing how the developed approach not only shows good performance, but
is generalizable and applicable to large-scale real-world text mining projects.
Finally, we discuss related literature, summarize the contributions of the work
and present some thoughts on future directions for biomedical event extraction.
This thesis includes and builds on six original research publications. The first
of these introduces the analysis of dependency parses that leads to
development of TEES. The entries in the three BioNLP Shared Tasks, as well as
in the DDIExtraction 2011 task are covered in four publications, and the sixth
one demonstrates the application of the system to PubMed-scale text mining.</p
- …