Search CORE

3 research outputs found

Semi-automatic conversion of BioProp semantic annotation to PASBio annotation

Author: AL Berger
B Santorini
C Warner
Chi-Hsin Huang
D Dowty
E Charniak
H-J Dai
Hong-Jie Dai
KB Cohen
M Collins
M Palmer
O Babko-Malaya
O Babko-Malaya
PK Shah
PK Shah
R Hoernig
RA Hudson
Richard Tzong-Han Tsai
RT-H Tsai
RT-H Tsai
S Pradhan
T Wattarujeekrit
V Punyakanok
W-C Chou
Wen-Lian Hsu
X Carreras
X Carreras
Y Kogan
Y Tateisi
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Semantic role labeling (SRL) is an important text analysis technique. In SRL, sentences are represented by one or more predicate-argument structures (PAS). Each PAS is composed of a predicate (verb) and several arguments (noun phrases, adverbial phrases, etc.) with different semantic roles, including main arguments (agent or patient) as well as adjunct arguments (time, manner, or location). PropBank is the most widely used PAS corpus and annotation format in the newswire domain. In the biomedical field, however, more detailed and restrictive PAS annotation formats such as PASBio are popular. Unfortunately, due to the lack of an annotated PASBio corpus, no publicly available machine-learning (ML) based SRL systems based on PASBio have been developed. In previous work, we constructed a biomedical corpus based on the PropBank standard called BioProp, on which we developed an ML-based SRL system, BIOSMILE. In this paper, we aim to build a system to convert BIOSMILE's BioProp annotation output to PASBio annotation. Our system consists of BIOSMILE in combination with a BioProp-PASBio rule-based converter, and an additional semi-automatic rule generator. Results Our first experiment evaluated our rule-based converter's performance independently from BIOSMILE performance. The converter achieved an F-score of 85.29%. The second experiment evaluated combined system (BIOSMILE + rule-based converter). The system achieved an F-score of 69.08% for PASBio's 29 verbs. Conclusion Our approach allows PAS conversion between BioProp and PASBio annotation using BIOSMILE alongside our newly developed semi-automatic rule generator and rule-based converter. Our system can match the performance of other state-of-the-art domain-specific ML-based SRL systems and can be easily customized for PASBio application development.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Event extraction from biomedical texts using trimmed dependency graphs

Author: Buyko Ekaterina
Publication venue
Publication date: 03/11/2012
Field of study

This thesis explores the automatic extraction of information from biomedical publications. Such techniques are urgently needed because the biosciences are publishing continually increasing numbers of texts. The focus of this work is on events. Information about events is currently manually curated from the literature by biocurators. Biocuration, however, is time-consuming and costly so automatic methods are needed for information extraction from the literature. This thesis is dedicated to modeling, implementing and evaluating an advanced event extraction approach based on the analysis of syntactic dependency graphs. This work presents the event extraction approach proposed and its implementation, the JReX (Jena Relation eXtraction) system. This system was used by the University of Jena (JULIE Lab) team in the "BioNLP 2009 Shared Task on Event Extraction" competition and was ranked second among 24 competing teams. Thereafter JReX was the highest scorer on the worldwide shared U-Compare event extraction server, outperforming the competing systems from the challenge. This success was made possible, among other things, by extensive research on event extraction solutions carried out during this thesis, e.g., exploring the effects of syntactic and semantic processing procedures on solving the event extraction task. The evaluations executed on standard and community-wide accepted competition data were complemented by real-life evaluation of large-scale biomedical database reconstruction. This work showed that considerable parts of manually curated databases can be automatically re-created with the help of the event extraction approach developed. Successful re-creation was possible for parts of RegulonDB, the world's largest database for E. coli. In summary, the event extraction approach justified, developed and implemented in this thesis meets the needs of a large community of human curators and thus helps in the acquisition of new knowledge in the biosciences

Digitale Bibliothek Thüringen

Nominalization and Alternations in Biomedical Language

Author: Adam Meyers
Adam Meyers
Adam Meyers
Adam Meyers
BarbaraH Partee
Ben Goertzel
Beth Levin
Carol Friedman
CharlesJ Fillmore
Christiane Fellbaum
DeborahA Dahl
Douglas Biber
George Dunham
George Hripcsak
Gondy Leroy
Gondy Leroy
James Pustejovsky
Jin-Dong Kim
JM Ko
John Lehrberger
Jonathan Schuman
K. Bretonnel Cohen
Karin Verspoor
KBretonnel Cohen
KBretonnel Cohen
Laurie Bauer
Lawrence Hunter
Leroy Gondy
Lynette Hirschman
M Narayanaswamy
Malka Rappaport-Hovav
Maria Koptjevskaja-Tamm
Martha Palmer
Martha Palmer
Martha Palmer
MartinF Porter
Michael Johnston
Michael Johnston
Naomi Sager
Naomi Sager
ParantuK Shah
PhilipV Ogren
PhilipV Ogren
Pierre Zweigenbaum
Ralph Grishman
Randolph Quirk
Richard Kittredge
Richard Tzong-Han Tsai
Robert P. Futrelle
RobertB Lees
Ron Artstein
Sameer Pradhan
Seth Kulick
T Ono
Thomas Herbst
Thomas Roeper
ThomasC Rindflesch
TimothyW Finin
Tony McEnery
Tuangthong Wattarujeekrit
Wen-Chi Chou
X Yuan
Yacov Kogan
Yuka Tateisi
Zellig Harris
Zheng Ping Jiang
ZZ Hu
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Background: This paper presents data on alternations in the argument structure of common domain-specific verbs and their associated verbal nominalizations in the PennBioIE corpus. Alternation is the term in theoretical linguistics for variations in the surface syntactic form of verbs, e.g. the different forms of stimulate in FSH stimulates follicular development and follicular development is stimulated by FSH. The data is used to assess the implications of alternations for biomedical text mining systems and to test the fit of the sublanguage model to biomedical texts. Methodology/Principal Findings: We examined 1,872 tokens of the ten most common domain-specific verbs or their zerorelated nouns in the PennBioIE corpus and labelled them for the presence or absence of three alternations. We then annotated the arguments of 746 tokens of the nominalizations related to these verbs and counted alternations related to the presence or absence of arguments and to the syntactic position of non-absent arguments. We found that alternations are quite common both for verbs and for nominalizations. We also found a previously undescribed alternation involving an adjectival present participle. Conclusions/Significance: We found that even in this semantically restricted domain, alternations are quite common, and alternations involving nominalizations are exceptionally diverse. Nonetheless, the sublanguage model applies to biomedica

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central