Search CORE

53 research outputs found

Tagging Scientific Publications using Wikipedia and Natural Language Processing Tools. Comparison on the ArXiv Dataset

Author: A Joorabchi
G Spanakis
G Spanakis
J Laherrère
JS Justeson
K Barker
M Porter
MA Montemurro
P Wang
R Agrawal
S Rose
ZK Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/11/2014
Field of study

In this work, we compare two simple methods of tagging scientific publications with labels reflecting their content. As a first source of labels Wikipedia is employed, second label set is constructed from the noun phrases occurring in the analyzed corpus. We examine the statistical properties and the effectiveness of both approaches on the dataset consisting of abstracts from 0.7 million of scientific documents deposited in the ArXiv preprint collection. We believe that obtained tags can be later on applied as useful document features in various machine learning tasks (document similarity, clustering, topic modelling, etc.)

arXiv.org e-Print Archive

Crossref

Using terminology extraction techniques for improving traceability from formal models to textual requirements

Author: B. Daille
F. Shipman
J. Euzenat
J. S. Justeson
K. W. Church
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2000
Field of study

cerbah2000aInternational audienceThis article deals with traceability in sotfware engineering. More precisely, we concentrate on the role of terminological knowledge the mapping between (informal) textual requirements and (formal) object models. We show that terminological knowledge facilitates production of traceability links, provided that language processing technologies allow to elaborate semi-automatically the required terminological resources. The presented system is one step towards incremental formalization from textual knowledge

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials

Author: AR Aronson
B de Bruijn
B Long
B Long
C Weng
CG Parker
GY Chung
H Nakagawa
H Paek
I Korkontzelos
I Korkontzelos
Ioannis Korkontzelos
J Justeson
J Thomas
KP Yee
KT Frantzi
ME Hernandez
P Fazi
R Xu
S Ananiadou
S Kiritchenko
S Osinski
Sophia Ananiadou
SW Tu
T Mu
Tingting Mu
Y Kano
Y Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Clinical trials are mandatory protocols describing medical research on humans and among the most valuable sources of medical practice evidence. Searching for trials relevant to some query is laborious due to the immense number of existing protocols. Apart from search, writing new trials includes composing detailed eligibility criteria, which might be time-consuming, especially for new researchers. In this paper we present ASCOT, an efficient search application customised for clinical trials. ASCOT uses text mining and data mining methods to enrich clinical trials with metadata, that in turn serve as effective tools to narrow down search. In addition, ASCOT integrates a component for recommending eligibility criteria based on a set of selected protocols

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

Relation Extraction for Open and Closed Domain Question Answering

Author: B Magnini
D Lin
D Lin
D McCarthy
D Ravichandran
Fundel K K¨uffner R, Zimmer R
G Bouma
J Justeson
L Braun
M Stevenson
O Etzioni
R Snow
S Pad´o
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Benchmarking Ontologies: Bigger or Better?

Author: A Faatz
A Gangemi
A Gomez-Perez
A Gómez-Pérez
A Mädche
A Mädche
A Rzhetsky
A Spooner
Andrey Rzhetsky
Anna Divoli
AR Aronson
AR Aronson
AR Aronson
AT McCray
AT McCray
AT McCray
AT McCray
AT McCray
B Smith
BA Kipfer
C Brewster
C Brewster
C Brewster
C Laird
C Rosse
CE Lipscomb
CJ Bult
CL Smith
D Lin
D Maynard
DL Cook
E Riloff
FB Rogers
G Jurasinski
G Miller
I Scholastic
I Sim
Ilya Mayzus
J Brank
J Devlin
J Evermann
J Yu
JA Blake
James A. Evans
JC Park
JI Rodale
JR Firth
JS Justeson
K Dellschaft
K Toutanova
K Toutanova
K Verspoor
K Verspoor
K. Bretonnel Cohen
KB Cohen
Lixia Yao
LM Spencer
M Ashburner
M Grüninger
M Minsky
M Missikoff
M Sabou
N Guarino
O Bodenreider
P Buitelaar
P Cimiano
PD Karp
R Cornet
R Navigli
S Hyun
S Kiritchenko
S Schulz
S York
S Zhang
SH Brown
TR Gruber
U Hahn
V Walden
W Ceusters
Y Sure
Z Harris
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

KneeTex: an ontology–driven system for information extraction from MRI reports

Author: A Clauset
A Coden
A Guermazi
AR Aronson
BW Mamlin
C Finch
C Friedman
C Funk
C Rosse
C Wenham
C Wu
CP Langlotz
D Crockford
D Ferrucci
DC Pompan
E Herrett
E Pessis
ES Burnside
FP Luyten
FW Roemer
G Hripcsak
G Salton
GK Savova
Health & Social Care Information Centre
I Spasić
I Spasić
I Spasić
J Cowie
J Day-Richter
J Geertzen
JL Fleiss
JPA Ioannidis
JR Landis
JS Justeson
K Button
K Rae
KS Button
LM Christensen
M Grover
M Yetisgen-Yildiz
MK Javaid
National Center for Biomedical Ontology
O Bodenreider
P Stenetorp
P Whetzel
PA Dang
R Artstein
R Yan
Radiological Society of North America
RAE Clayton
RS Crowley
S Konan
SH Brown
SK Mohanty
T Adamusiak
The Royal College of Radiologists
UMLS
UMLS
WE Winkler
WR Hersh
WW Cohen
Y Tsuruoka
Y-S Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/09/2015
Field of study

Background. In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain. Methods. As an ontology–driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain–specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico–semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co–reference resolution, followed by text segmentation. Ontology–based semantic typing is then used to drive the template filling process. Results. We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine–grained lexico–semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated on a test set of 100 MRI reports. A gold standard consisted of 1,259 filled template records with the following slots: finding, finding qualifier, negation, certainty, anatomy and anatomy qualifier. KneeTex extracted information with precision of 98.00%, recall of 97.63% and F–measure of 97.81%, the values of which are in line with human–like performance. Conclusions. KneeTex is an open–source, stand–alone application for information extraction from narrative reports that describe an MRI scan of the knee. Given an MRI report as input, the system outputs the corresponding clinical findings in the form of JavaScript Object Notation objects. The extracted information is mapped onto TRAK, an ontology that formally models knowledge relevant for the rehabilitation of knee conditions. As a result, formally structured and coded information allows for complex searches to be conducted efficiently over the original MRI reports, thereby effectively supporting epidemiologic studies of knee conditions

Crossref

Online Research @ Cardiff

Springer - Publisher Connector

PubMed Central

A lexico-syntactic analysis of antonym co-occurrence in spoken English

Author: Justeson J. S.
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

Crossref

Astronomical Implications of Maya Hieroglyphic Notations at Xultun

Author: Justeson J.
Lounsbury F.
Meeus J.
Teeple J.
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

Identifying Newly-coined Terms which are to be Important in Special Domains

Author: H. MIMA
H. NAKAGAWA
J. S. JUSTESON
K. KAGEURA
Keita TSUJI
Publication venue: 'Japan Society of Information and Knowledge'
Publication date: 01/01/2005
Field of study

Crossref

Towards the Automatic Learning of Idiomatic Prepositional Phrases

Author: C.D. Manning
I. Mel’cuk
J. Graña Gil
J.S. Justeson
L. Degand
T.. Dunning
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Crossref