Abstract Background Event extraction following the GENIA Event corpus and BioNLP shared task models has been a considerable focus of recent work in biomedical information extraction. This work includes efforts applying event extraction methods to the entire PubMed literature database, far beyond the narrow subdomains of biomedicine for which annotated resources for extraction method development are available. Results In the present study, our aim is to estimate the coverage of all statements of gene/protein associations in PubMed that existing resources for event extraction can provide. We base our analysis on a recently released corpus automatically annotated for gene/protein entities and syntactic analyses covering the entire PubMed, and use named entity co-occurrence, shortest dependency paths and an unlexicalized classifier to identify likely statements of gene/protein associations. A set of high-frequency/high-likelihood association statements are then manually analyzed with reference to the GENIA ontology. Conclusions We present a first estimate of the overall coverage of gene/protein associations provided by existing resources for event extraction. Our results suggest that for event-type associations this coverage may be over 90%. We also identify several biologically significant associations of genes and proteins that are not addressed by these resources, suggesting directions for further extension of extraction coverage.</p

Ohta, Tomoko

Pyysalo, Sampo

Tsujii, Jun’ichi

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

English

A Comparative Study of Syntactic Parsers for Event Extraction.

A: Assessment of the Second BioCreative PPI Task: Automatic Extraction of ProteinProtein Interactions.

A: Evaluation of BioCreAtlvE assessment of task 2.

A: Overview of Genia Event Task in BioNLP Shared Task

An environment for relation mining over richly annotated corpora: The case of GENIA.

Ananiadou S: Towards Event Extraction from Full Texts on Infectious Diseases.

Any Domain Parsing: Automatic Domain Adapatation for Parsing. PhD thesis Brown;

BANNER: An executable survey of advances in biomedical named entity recognition.

Biomedical Information Extraction with Predicate-Argument Structure Patterns.

Charniak E: Self-Training for Biomedical Parsing.

CM: An Exploration of Mining Gene Expression Mentions and Their Anatomical Locations from Biomedical Text.

Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking.

Comparative analysis of five protein-protein interaction corpora.

Corpus annotation for mining biomedical events from literature.

Corpus: An Annotated Research Abstract Corpus

DB: Event extraction for systems biology by text mining the literature. Trends in Biotechnology

DS: Open information extraction from the web. Commun.

Etzioni O: The Tradeoffs Between Open and Traditional Relation Extraction.

Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks.

Event Extraction for DNA Methylation.

Event Extraction for Post-Translational Modifications.

Extracting Complex Biological Events with Rich Graph-Based Feature Sets.

Extracting protein-protein interactions from text using rich feature vectors and feature selection.

Incorporating GENETAG-style annotation to GENIA corpus.

Investigating heterogeneous protein annotations toward cross-corpora utilization.

L: The compositional structure of Gene Ontology terms.

Lin CJ: LIBSVM: a library for support vector machines.

LINNAEUS: A species name identification system for biomedical literature.

Mooney RJ: A shortest path dependency kernel for relation extraction.

Overview of BioNLP Shared Task

Overview of the Entity Relations (REL) supporting task of BioNLP Shared Task

Protein-Protein Interaction Extraction by Leveraging Multiple Kernels and Parsers.

Reducing Semantic Drift with Bagging and Distributional Similarity.

RelEx—Relation extraction using dependency parse trees. Bioinformatics

Salakoski T: All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning.

Salakoski T: Biolnfer: A Corpus for Information Extraction in the Biomedical Domain.

Salakoski T: Complex event extraction at PubMed scale. Bioinformatics

Salakoski T: Complex-to-pairwise mapping of biological relationships using a semantic network representation.

Salakoski T: Scaling up Biomedical Event Extraction to the Entire PubMed.

Static Relations: a Piece in the Biomedical Information Extraction Puzzle.

Syntactic features for protein-protein interaction extraction.

Syntax Annotation for the GENIA corpus.

The Gene Ontology Consortium: Gene Ontology: tool for the unification of biology.

The Stanford Parser: A statistical parser.

The Stanford typed dependencies representation.

Towards Exhaustive Event Extraction for Protein Modifications.

WJ: GENETAG: A tagged corpus for gene/protein named entity recognition.

An analysis of gene/protein associations at PubMed scale

Collier, N

Hahn, U

Pyysalo, S

Rebholz-Schuhmann, D

Rinaldi, F

ZORA

An Analysis of Gene/Protein Associations at PubMed Scale.

Coreference Based Event-Argument Relation Extraction on Biomedical Text.

Data preparation and interannotator agreement: BioCreAtIvE task 1B. BMC bioinformatics 2005, 6(Suppl 1):S12.

GENIA corpus-a semantically annotated corpus for bio-textmining. Bioinformatics

Improving the extraction of complex regulatory events from scientific text by using ontology-based inference.

Introduction to the bio-entity recognition task at JNLPBA.

Linguistic scope-based and biological event-based speculation and negation annotations in the BioScope and Genia Event corpora.

Nenadic G: IeXML: towards a framework for interoperability of text processing modules to improve annotation of semantic types in biomedical text.

Predicting speculation: A simple disambiguation approach to hedge detection in biomedical literature.

Rebholz-Schuhmann D: Ontology design patterns to disambiguate relations between genes and gene products in GENIA.

Towards Cross-lingual Alerting for Bursty Epidemic Events.

Zweigenbaum P: Automatic Extraction of Semantic Relations between Medical Entities: a Rule Based Approach.

An analysis of gene/protein associations at PubMed scale

Towards mature use of semantic resources for biomedical analyses

Building a glaucoma interaction network using a text mining approach