Search CORE

60 research outputs found

Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation

Author: C. Fellbaum
D. Yarowsky
Lucia Specia
M. Stevenson
Maria das Graças Volpe Nunes
Mark Stevenson
S. Muggleton
S. Muggleton
S. Muggleton
S. Muggleton
Y. Wilks
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2010
Field of study

Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources

Crossref

White Rose Research Online

Magnetic resonance diffusion tensor microimaging reveals a role for Bcl-x in brain development and homeostasis

Author: Chen Y. B.
Hardwick J. M.
Miller M. I.
Mori S.
Plachez C.
Richards L. J.
van Zijl P.
Yarowsky P.
Zhang J. Y.
Publication venue: 'Society for Neuroscience'
Publication date: 01/01/2005
Field of study

A new technique based on diffusion tensor imaging and computational neuroanatomy was developed to efficiently and quantitatively characterize the three- dimensional morphology of the developing brains. The technique was used to analyze the phenotype of conditional Bcl-x knock-out mice, in which the bcl-x gene was deleted specifically in neurons of the cerebral cortex and hippocampus beginning at embryonic day 13.5 as cells became postmitotic. Affected brain regions and associated axonal tracts showed severe atrophy in adult Bcl-x-deficient mice. Longitudinal studies revealed that these phenotypes are established by regressive processes that occur primarily during the first postnatal week, whereas neurogenesis and migration showed no obvious abnormality during embryonic stages. Specific families of white matter tracts that once formed normally during the embryonic stages underwent dramatic degeneration postnatally. Thus, this technique serves as a powerful tool to efficiently localize temporal and spatial manifestation of morphological phenotype

Crossref

University of Queensland eSpace

Collocation analysis for UMLS knowledge-based word sense disambiguation

Author: A Aronson
A Jimeno-Yepes
A Jimeno-Yepes
A Jimeno-Yepes
A Jimeno-Yepes
A Purandare
Alan R Aronson
Antonio Jimeno-Yepes
B McInnes
B McInnes
B Rosario
Bridget T Mclnnes
C Leacock
C Manning
D Yarowsky
H Schütze
M Schuemie
M Stevenson
M Weeber
O Bodenreider
PR Cohen
S Humphrey
S Patwardhan
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

BACKGROUND: The effectiveness of knowledge-based word sense disambiguation (WSD) approaches depends in part on the information available in the reference knowledge resource. Off the shelf, these resources are not optimized for WSD and might lack terms to model the context properly. In addition, they might include noisy terms which contribute to false positives in the disambiguation results. METHODS: We analyzed some collocation types which could improve the performance of knowledge-based disambiguation methods. Collocations are obtained by extracting candidate collocations from MEDLINE and then assigning them to one of the senses of an ambiguous word. We performed this assignment either using semantic group profiles or a knowledge-based disambiguation method. In addition to collocations, we used second-order features from a previously implemented approach.Specifically, we measured the effect of these collocations in two knowledge-based WSD methods. The first method, AEC, uses the knowledge from the UMLS to collect examples from MEDLINE which are used to train a Naïve Bayes approach. The second method, MRD, builds a profile for each candidate sense based on the UMLS and compares the profile to the context of the ambiguous word.We have used two WSD test sets which contain disambiguation cases which are mapped to UMLS concepts. The first one, the NLM WSD set, was developed manually by several domain experts and contains words with high frequency occurrence in MEDLINE. The second one, the MSH WSD set, was developed automatically using the MeSH indexing in MEDLINE. It contains a larger set of words and covers a larger number of UMLS semantic types. RESULTS: The results indicate an improvement after the use of collocations, although the approaches have different performance depending on the data set. In the NLM WSD set, the improvement is larger for the MRD disambiguation method using second-order features. Assignment of collocations to a candidate sense based on UMLS semantic group profiles is more effective in the AEC method.In the MSH WSD set, the increment in performance is modest for all the methods. Collocations combined with the MRD disambiguation method have the best performance. The MRD disambiguation method and second-order features provide an insignificant change in performance. The AEC disambiguation method gives a modest improvement in performance. Assignment of collocations to a candidate sense based on knowledge-based methods has better performance. CONCLUSIONS: Collocations improve the performance of knowledge-based disambiguation methods, although results vary depending on the test set and method used. Generally, the AEC method is sensitive to query drift. Using AEC, just a few selected terms provide a large improvement in disambiguation performance. The MRD method handles noisy terms better but requires a larger set of terms to improve performance

Crossref

Springer - Publisher Connector

PubMed Central

University of Melbourne Institutional Repository

eGIFT: Mining Gene Information from the Literature

Author: A Gladki
AS Schwartz
C Blaschke
C Perez-Iratxeta
Carl J Schmidt
Catalina O Tudor
CO Tudor
D Cheng
D Rebholz-Schuhmann
D Yarowsky
H Maier
H Shatkay
J Ding
J McEntyre
J Miller
JJ Kim
K Fundel
K Vijay-Shanker
KB Cohen
LC Tsoi
M Krallinger
MA Andrade
NR Smalheiser
O Gospodnetic
PK Shah
R Bruce
R Jelier
S Gaudan
S Kaczanowski
S Pakhomov
Y Liu
Y Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many irrelevant hits. As a result, it is difficult for life scientists and gene curators to rapidly get an overall picture about a specific gene from documents that mention its names and synonyms. Results In this paper, we present eGIFT (<url>http://biotm.cis.udel.edu/eGIFT</url>), a web-based tool that associates informative terms, called <it>i</it>Terms, and sentences containing them, with genes. To associate <it>i</it>Terms with a gene, eGIFT ranks <it>i</it>Terms about the gene, based on a score which compares the frequency of occurrence of a term in the gene's literature to its frequency of occurrence in documents about genes in general. To retrieve a gene's documents (Medline abstracts), eGIFT considers all gene names, aliases, and synonyms. Since many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene. Another additional filtering process is applied to retain those abstracts that focus on the gene rather than mention it in passing. eGIFT's information for a gene is pre-computed and users of eGIFT can search for genes by using a name or an EntrezGene identifier. <it>i</it>Terms are grouped into different categories to facilitate a quick inspection. eGIFT also links an <it>i</it>Term to sentences mentioning the term to allow users to see the relation between the <it>i</it>Term and the gene. We evaluated the precision and recall of eGIFT's <it>i</it>Terms for 40 genes; between 88% and 94% of the <it>i</it>Terms were marked as salient by our evaluators, and 94% of the UniProtKB keywords for these genes were also identified by eGIFT as <it>i</it>Terms. Conclusions Our evaluations suggest that <it>i</it>Terms capture highly-relevant aspects of genes. Furthermore, by showing sentences containing these terms, eGIFT can provide a quick description of a specific gene. eGIFT helps not only life scientists survey results of high-throughput experiments, but also annotators to find articles describing gene aspects and functions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Training text chunkers on a silver standard corpus: can silver replace gold?

Author: AR Aronson
B Carpenter
D Ferrucci
D Rebholz-Schuhmann
D Yarowsky
E Buyko
E Sang
Erik M van Mulligen
H Cunningham
J Kim
Jan A Kors
K Seki
KW Boyack
L Smith
M Banko
M Surdeanu
M Van Erp
MFM Chowdhury
N Kang
N Littlestone
Ning Kang
R Polikar
S Kulick
T Kudo
Y Tateisi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Glioblastoma cells express functional cell membrane receptors activated by daily used medical drugs

Author: A Bordey
A Fatatis
A Hernández-Angeles
A Sharma
A Verkhratsky
A Verkhratsky
A Verkhtatsky
AB Parekh
AB Parekh
AB Parekh
AK Weaver
AS Bender
BA Barres
C Iadecola
C Labrakakis
C Labrakakis
CA Castillo
CG Schipke
Christian Ewald
Christian R. A. Regenbrecht
CM Anderson
DD Friel
DE Clapham
DM Cooper
E Hegedus
E McCoy
FC Nielsen
G Grynkiewicz
G Perea
GN Hyde
GR Gordon
H Kettenmann
H Sontheimer
H Sontheimer
H Sontheimer
H Sontheimer
Hartwig Kosmehl
HR Eistetter
Ilona Schoenwald
JK Jaiswal
JW Putney Jr
JZ Han
K Abe
K Chang
K Färber
KG Baimbridge
L Hösli
L Hösli
L Zhu
M Barajas
M Bernstein
M Iino
M Matyash
M Synowitz
M Wienrich
M Łazarczyk
MA Albrecht
Michael Brodhun
MJ Berridge
MJ Berridge
ML Olsen
MP Blaustein
O Peters
P Sokolowska
P Weydt
PJ Yarowsky
Rolf Kalff
Rupert Reichart
S Bardo
S Kirischuk
SA Lyons
Susanne A. Kuhn
SV Straub
T Möller
TA Lovick
Ulrike Mueller
Uwe-K. Hanisch
W Luo
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Regulation of membrane excitability: a convergence on voltage-gated sodium conductance

Author: A Chatelier
A Chatelier
A Constanti
A Escayg
AL Goldin
AP Gerber
AR Cantrell
B Ganetzky
B Ye
BA Schweers
BJ Murphy
C Racca
C Yue
CE Stafstrom
CJ Laedermann
CJ Mee
CK Raymond
D Chagnovich
D Kuebler
D Shao
DJ Schulz
DM Jones-Davis
DS Spassov
DW Chadwick
E Aronica
E Gershon
E Marder
EH Reynolds
EL Heinzen
ER Reynolds
ET Wang
EV Fletcher
F Brigo
F Spadoni
G Feng
GE Stilwell
GG Turrigiano
H Siemen
H Siomi
H Vacher
HE Driscoll
HS White
J Dubnau
J Lee
J Song
J Sonoda
J Tan
J Ule
J Ule
J Ule
JA Ekberg
JK Diss
JL Wagnon
JP Vessey
JS Tan
JW Park
K Dong
KP Menon
KP Menon
L Parker
L Sun
M Gastaldi
M Li
M Wickens
N Kasai
NI Muraro
NS Desai
NV Grishin
NW Plummer
NW Plummer
P Seshaiah
PD Zamore
PJ Yarowsky
PS Dietrich
Q Li
Q Pan
R Marley
R Sarao
R Thimmapaya
RA Baines
RA Baines
RD Smith
RD Smith
RD Smith
Richard A. Baines
RJ Buckanovich
RO Olson
S Taverna
SH Lee
T Eom
T Kiss
TA Gustafson
TG Davies
TI Chao
W Song
W Sun
W Zhang
WA Catterall
Wei-Hsiang Lin
WH Lin
WH Lin
Y Murata
Y Oh
YY Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Quantitative approaches to content analysis: identifying conceptual drift across publication outlets

Author: Abrahamson E
Baskerville R
Bobrow DG
De Guinea AO
Dirk S Hovorka
Fernandez WD
Foltz PW
Hovorka DS
Indulska M
Jan Recker
Landauer TK
Liang H
Marta Indulska
Miles MB
Penn-Edwards S
Salton G
Saunders C
Sidorova A
Weber RP
Yarowsky D
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Unstructured text data, such as emails, blogs, contracts, academic publications, organizational documents, transcribed interviews, and even tweets, are important sources of data in Information Systems research. Various forms of qualitative analysis of the content of these data exist and have revealed important insights. Yet, to date, these analyses have been hampered by limitations of human coding of large data sets, and by bias due to human interpretation. In this paper, we compare and combine two quantitative analysis techniques to demonstrate the capabilities of computational analysis for content analysis of unstructured text. Specifically, we seek to demonstrate how two quantitative analytic methods, viz., Latent Semantic Analysis and data mining, can aid researchers in revealing core content topic areas in large (or small) data sets, and in visualizing how these concepts evolve, migrate, converge or diverge over time. We exemplify the complementary application of these techniques through an examination of a 25-year sample of abstracts from selected journals in Information Systems, Management, and Accounting disciplines. Through this work, we explore the capabilities of two computational techniques, and show how these techniques can be used to gather insights from a large corpus of unstructured text

Bond University Research Portal

Crossref

Kölner UniversitätsPublikationsServer

Queensland University of Technology ePrints Archive

University of Queensland eSpace