Search CORE

723 research outputs found

Using a shallow linguistic kernel for drug-drug interaction extraction

Author: Martínez Fernández Paloma
Pablo-Sánchez César de
Segura-Bedmar Isabel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

A drug–drug interaction (DDI) occurs when one drug influences the level or activity of another drug. Information Extraction (IE) techniques can provide health care professionals with an interesting way to reduce time spent reviewing the literature for potential drug–drug interactions. Nevertheless, no approach has been proposed to the problem of extracting DDIs in biomedical texts. In this article, we study whether a machine learning-based method is appropriate for DDI extraction in biomedical texts and whether the results provided are superior to those obtained from our previously proposed pattern-based approach [1]. The method proposed here for DDI extraction is based on a supervised machine learning technique, more specifically, the shallow linguistic kernel proposed in Giuliano et al. (2006) [2]. Since no benchmark corpus was available to evaluate our approach to DDI extraction, we created the first such corpus, DrugDDI, annotated with 3169 DDIs. We performed several experiments varying the configuration parameters of the shallow linguistic kernel. The model that maximizes the F-measure was evaluated on the test data of the DrugDDI corpus, achieving a precision of 51.03%, a recall of 72.82% and an F-measure of 60.01%. To the best of our knowledge, this work has proposed the first full solution for the automatic extraction of DDIs from biomedical texts. Our study confirms that the shallow linguistic kernel outperforms our previous pattern-based approach. Additionally, it is our hope that the DrugDDI corpus will allow researchers to explore new solutions to the DDI extraction problem.This study was funded by the Projects MA2VICMR (S2009/TIC-1542) and MULTIMEDICA (TIN2010-20644-C03-01).Publicad

Universidad Carlos III de Madrid e-Archivo

Heterogeneous biomedical database integration using a hybrid strategy: a p53 cancer research database.

Author: Bichutskiy Vadim Y
Brachmann Rainer K
Colman Richard
Lathrop Richard H
Publication venue: eScholarship, University of California
Publication date: 01/01/2006
Field of study

Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.)

Directory of Open Access Journals

eScholarship - University of California

Predicting functional associations from metabolism using bi-partite network algorithms

Author: B Veeramani
Balaji Veeramani
C Stark
D Segrè
DS Lee
J Ihmels
J Schellenberger
Joel S Bader
K Tsuda
NC Duarte
ND Price
O Rokhlenko
P Kharchenko
P Resnik
PW Lord
R Harrison
RA Notebaart
SA Becker
SA Becker
T Sing
Y Qi
YHY Benjamini
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Metabolic reconstructions contain detailed information about metabolic enzymes and their reactants and products. These networks can be used to infer functional associations between metabolic enzymes. Many methods are based on the number of metabolites shared by two enzymes, or the shortest path between two enzymes. Metabolite sharing can miss associations between non-consecutive enzymes in a serial pathway, and shortest-path algorithms are sensitive to high-degree metabolites such as water and ATP that create connections between enzymes with little functional similarity. Results We present new, fast methods to infer functional associations in metabolic networks. A local method, the degree-corrected Poisson score, is based only on the metabolites shared by two enzymes, but uses the known metabolite degree distribution. A global method, based on graph diffusion kernels, predicts associations between enzymes that do not share metabolites. Both methods are robust to high-degree metabolites. They out-perform previous methods in predicting shared Gene Ontology (GO) annotations and in predicting experimentally observed synthetic lethal genetic interactions. Including cellular compartment information improves GO annotation predictions but degrades synthetic lethal interaction prediction. These new methods perform nearly as well as computationally demanding methods based on flux balance analysis. Conclusions We present fast, accurate methods to predict functional associations from metabolic networks. Biological significance is demonstrated by identifying enzymes whose strong metabolic correlations are missed by conventional annotations in GO, most often enzymes involved in transport vs. synthesis of the same metabolite or other enzyme pairs that share a metabolite but are separated by conventional pathway boundaries. More generally, the methods described here may be valuable for analyzing other types of networks with long-tailed degree distributions and high-degree hubs.</p

Springer - Publisher Connector

Directory of Open Access Journals

Biomedical Event Extraction with Machine Learning

Author: Björne Jari
Publication venue: TUCS Dissertations
Publication date: 28/10/2022
Field of study

Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein--protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence ``Protein A causes protein B to bind protein C'' can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing.  Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.</p