Search CORE

307 research outputs found

A UIMA wrapper for the NCBO annotator

Author: Baumgartner
C. Jonquet
C. Roeder
Hunter
K. Verspoor
L. Hunter
N. H. Shah
W. A. Baumgartner
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Summary: The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. This note describes our work, which wraps the National Center for Biomedical Ontology (NCBO) Annotator—an ontology-based annotation service—to make it available as a component in UIMA workflows

Crossref

PubMed Central

HAL Descartes

University of Melbourne Institutional Repository

Themes in biomedical natural language processing: BioNLP08

Author: K Verspoor
A Airola
A Roberts
P Corbett
Y Sasaki
X Wang
M Stevenson
Y Tsuruoka
V Vincze
H Kilicoglu
A Neveol
Publication venue: BioMed Central
Publication date: 01/01/1991
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer

Institute of Mathematics AS CR, v. v. i.

Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

Author: A Roberts
A Shah
Aleksandar Savkov
B Efron
G Hripcsak
G Savova
J Cohen
J Foster
J-W Fan
Jackie Cassell
John Carroll
K Verspoor
KH Krippendorff
LK Tanabe
M Bada
MP Marcus
Rob Koeling
S Abney
W Sun
Ö Uzuner
Ö Uzuner
Ö Uzuner
Ö Uzuner
Ö Uzuner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

Crossref

Springer - Publisher Connector

PubMed Central

Sussex Research Online

The textual characteristics of traditional and Open Access scientific journals are similar

Author: A Knebel
A Swan
C Blaschke
D Biber
D Ferrucci
DP Corney
G Eysenbach
K Bretonnel Cohen
K Curran
K Verspoor
Karin Verspoor
KB Cohen
L Tanabe
Lawrence Hunter
M Krallinger
M Palmer
MP Marcus
P Rayson
PK Shah
S Kullback
T Dunning
W Hersh
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods developed for analysis of Open Access full text publications will generalize to the biomedical literature as a whole. If this assumption is wrong, the cost to the community will be large, including not just wasted resources, but also flawed science. This paper examines that assumption. Results We collected two sets of documents, one consisting only of Open Access publications and the other consisting only of traditional journal publications. We examined them for differences in surface linguistic structures that have obvious consequences for the ease or difficulty of natural language processing and for differences in semantic content as reflected in lexical items. Regarding surface linguistic structures, we examined the incidence of conjunctions, negation, passives, and pronominal anaphora, and found that the two collections did not differ. We also examined the distribution of sentence lengths and found that both collections were characterized by the same mode. Regarding lexical items, we found that the Kullback-Leibler divergence between the two collections was low, and was lower than the divergence between either collection and a reference corpus. Where small differences did exist, log likelihood analysis showed that they were primarily in the area of formatting and in specific named entities. Conclusion We did not find structural or semantic differences between the Open Access and traditional journal collections.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT

Author: Davis M.J.
Elangovan A.
Li Y.
Pires D.E.V.
Verspoor K.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.Aparna Elangovan, Yuan Li, Douglas E. V. Pires, Melissa J. Davis, and Karin Verspoo

arXiv.org e-Print Archive

Adelaide Research & Scholarship

PubMed Central

University of Melbourne Institutional Repository

SCRIPDB: a portal for easy access to syntheses, chemicals and reactions in patents

Author: A. Heifets
Corey
Guha
Hattori
I. Jurisica
Law
Pirok
Podolyan
Sheridan
Southall
Stewart
Tanaka
Thangaraj
Verspoor
Wang
Publication venue: Oxford University Press
Publication date
Field of study

The patent literature is a rich catalog of biologically relevant chemicals; many public and commercial molecular databases contain the structures disclosed in patent claims. However, patents are an equally rich source of metadata about bioactive molecules, including mechanism of action, disease class, homologous experimental series, structural alternatives, or the synthetic pathways used to produce molecules of interest. Unfortunately, this metadata is discarded when chemical structures are deposited separately in databases. SCRIPDB is a chemical structure database designed to make this metadata accessible. SCRIPDB provides the full original patent text, reactions and relationships described within any individual patent, in addition to the molecular files common to structural databases. We discuss how such information is valuable in medical text mining, chemical image analysis, reaction extraction and in silico pharmaceutical lead optimization. SCRIPDB may be searched by exact chemical structure, substructure or molecular similarity and the results may be restricted to patents describing synthetic routes. SCRIPDB is available at http://dcv.uhnres.utoronto.ca/SCRIPDB

Crossref

PubMed Central

Gender equality and girls education: Investigating frameworks, disjunctures and meanings of quality education

Author: Aikman S
Aikman S
Arnot M
ASPBAE and UNGEI
Bandyopadhyay M
Barber L
Budlender D
Chaudhury N
Croft A
Geeves R
Herz B
Hickling-Hudson A
Lewin KM
Lewis M
Marshall H
Mitchell C
Morrell R
Ramachandran V
Rao N
Rao N
Sieder R
Stromquist N
Tomasevski K
UNESCO
UNICEF
UNICEF and UNGEI
Unterhalter E
Unterhalter E
Verspoor A
Wood J
Publication venue: 'SAGE Publications'
Publication date: 08/11/2012
Field of study

The article draws on qualitative educational research across a diversity of low-income countries to examine the gendered inequalities in education as complex, multi-faceted and situated rather than a series of barriers to be overcome through linear input–output processes focused on isolated dimensions of quality. It argues that frameworks for thinking about educational quality often result in analyses of gender inequalities that are fragmented and incomplete. However, by considering education quality more broadly as a terrain of quality it investigates questions of educational transitions, teacher supply and community participation, and develops understandings of how education is experienced by learners and teachers in their gendered lives and their teaching practices. By taking an approach based on theories of human development the article identifies dynamics of power underpinning gender inequalities in the literature and played out in diverse contexts and influenced by social, cultural and historical contexts. The review and discussion indicate that attaining gender equitable quality education requires recognition and understanding of the ways in which inequalities intersect and interrelate in order to seek out multi-faceted strategies that address not only different dimensions of girls’ and women’s lives, but understand gendered relationships and structurally entrenched inequalities between women and men, girls and boys

Crossref

University of East Anglia digital repository

Motivation for or from bilingual education? A comparative study of learner views in the Netherlands

Author: Azarnoosh M.
Baetens Beardsmore H.
Banegas D. L.
Boone H. N.
Cohen J.
Coleman L.
Coyle D.
de Graaff R.
Deci E. L.
Dörnyei Z.
Friedman H. H.
Gajo L.
Gardner R. C.
Koster A.
Lasagabaster D.
Maljers A.
Markus H.
Mearns T.
Nuffield
Rumlich D.
Somers T.
Sylvén L. K.
Ting Y. L. T.
Ushioda E.
Verspoor M.
Verspoor M.
Weenink D.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2017
Field of study

Teaching and Teacher Learning (ICLON

Crossref

Edinburgh Research Explorer

Leiden University Scholary Publications

Can sexual selection drive female life histories? A comparative study on Galliform birds

Sexual selection is an important driver of many of the most spectacular morphological traits that we find in the animal kingdom (for example see Andersson, 1994). As such, sexual selection is most often emphasized as

CiteSeerX

Crossref

Edinburgh Research Explorer