Search CORE

59 research outputs found

Preparation of name and address data for record linkage using hidden Markov models

Author: A McCallum
A Rigo
AP Dempster
B Aldelberg
C Barrett
D Carnall
D Freitag
D Freitag
DL Ellsworth
DM Bikel
G van Rossum
GD Forney
Justin Xi Zhu
K Seymore
Kim Lim
L Gill
L Gill
L Rabiner
LJ Cook
LL Roos
LR Rabiner
MatchWare Technologies
ME Califf
MJ Khoury
National Center for Biotechnology Information
New South Wales Department of Health
P Armitage
P Christen
P-S Laplace
Peter Christen
Public Health Division
S Soderland
SE Levinson
SF Altschul
Tim Churches
TR Leek
V Borkar
WE Winkler
Publication venue: BioMed Central
Publication date: 01/12/2002
Field of study

BACKGROUND: Record linkage refers to the process of joining records that relate to the same entity or event in one or more data collections. In the absence of a shared, unique key, record linkage involves the comparison of ensembles of partially-identifying, non-unique data items between pairs of records. Data items with variable formats, such as names and addresses, need to be transformed and normalised in order to validly carry out these comparisons. Traditionally, deterministic rule-based data processing systems have been used to carry out this pre-processing, which is commonly referred to as "standardisation". This paper describes an alternative approach to standardisation, using a combination of lexicon-based tokenisation and probabilistic hidden Markov models (HMMs). METHODS: HMMs were trained to standardise typical Australian name and address data drawn from a range of health data collections. The accuracy of the results was compared to that produced by rule-based systems. RESULTS: Training of HMMs was found to be quick and did not require any specialised skills. For addresses, HMMs produced equal or better standardisation accuracy than a widely-used rule-based system. However, acccuracy was worse when used with simpler name data. Possible reasons for this poorer performance are discussed. CONCLUSION: Lexicon-based tokenisation and HMMs provide a viable and effort-effective alternative to rule-based systems for pre-processing more complex variably formatted data such as addresses. Further work is required to improve the performance of this approach with simpler data such as names. Software which implements the methods described in this paper is freely available under an open source license for other researchers to use and improve

ANU Digital Collections

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Australian National University

Association of eGFR-Related Loci Identified by GWAS with Incident CKD and ESRD

Author: AS Levey
K Matsushita
SE Baumeister
A Meguid El Nahas
AI Adler
CY Hsu
M Kastarinen
E Ritz
J Coresh
CS Fox
CS Fox
SG Satko
G Genovese
WH Kao
JB Kopp
A Köttgen
A Köttgen
JC Chambers
RC Ma
A Shimazaki
MG Pezzolesi
MG Pezzolesi
A Alkhalaf
BI Freedman
B He
D Zhang
A Köttgen
M Liu
HE Wheeler
FL Brancati
LD Bash
F Kronenberg
E Pillebout
A Viau
H Schmid
Z Al-Aly
LS Dalrymple
R Agarwal
E Borthwick
KJ Kelly
JP van Kuijk
MT James
C Ronco
WC Winkelmayer
MM Ward
P Soderland
JF Mann
AS Levey
CJ Willer
R DerSimonian
WJ Gauderman
AD Johnson
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Family studies suggest a genetic component to the etiology of chronic kidney disease (CKD) and end stage renal disease (ESRD). Previously, we identified 16 loci for eGFR in genome-wide association studies, but the associations of these single nucleotide polymorphisms (SNPs) for incident CKD or ESRD are unknown. We thus investigated the association of these loci with incident CKD in 26,308 individuals of European ancestry free of CKD at baseline drawn from eight population-based cohorts followed for a median of 7.2 years (including 2,122 incident CKD cases defined as eGFR <60ml/min/1.73m2 at follow-up) and with ESRD in four case-control studies in subjects of European ancestry (3,775 cases, 4,577 controls). SNPs at 11 of the 16 loci (UMOD, PRKAG2, ANXA9, DAB2, SHROOM3, DACH1, STC1, SLC34A1, ALMS1/NAT8, UBE2Q2, and GCKR) were associated with incident CKD; p-values ranged from p = 4.1e-9 in UMOD to p = 0.03 in GCKR. After adjusting for baseline eGFR, six of these loci remained significantly associated with incident CKD (UMOD, PRKAG2, ANXA9, DAB2, DACH1, and STC1). SNPs in UMOD (OR = 0.92, p = 0.04) and GCKR (OR = 0.93, p = 0.03) were nominally associated with ESRD. In summary, the majority of eGFR-related loci are either associated or show a strong trend towards association with incident CKD, but have modest associations with ESRD in individuals of European descent. Additional work is required to characterize the association of genetic determinants of CKD and ESRD at different stages of disease progression

Crossref

DRO Deakin Research Online

UNIL IRIS | Institutional Research Information System

Directory of Open Access Journals

PubMed Central

Open Access LMU ( Ludwig-Maximilians-Univ. München)

Erasmus University Digital Repository

Online-Publikations-Server der Universität Würzburg

Enriching a biomedical event corpus with meta-knowledge annotation

Author: A de Waard
A de Waard
A Rzhetsky
AM Cohen
AS Yeh
B Medlock
F Lisacek
H Kilicoglu
H Langer
H Shatkay
J Cohen
J Ding
J Kim
John McNaught
JT Kim
K Hirohata
K Hyland
K Hyland
K Hyland
K Oda
KB Cohen
L Hoye
L McKnight
M Ashburner
M Liakata
M Light
ME Califf
O Sanchez-Graillet
P Ruch
P Thompson
P Thompson
P Zweigenbaum
P Zweigenbaum
Paul Thompson
R Bunescu
R Nawaz
Raheel Nawaz
S Ananiadou
S Soderland
S Teufel
S Teufel
Sophia Ananiadou
V Rizomilioti
V Vincze
VL Rubin
WJ Wilbur
Y Miyao
Y Mizuta
Á Sándor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Background: Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-disease associations, etc., we rely on event corpora that are annotated with classified, structured representations of important facts and findings contained within text. These provide an important resource for the training of domain-specific information extraction (IE) systems, to facilitate semantic-based searching of documents. Correct interpretation of these events is not possible without additional information, e.g., does an event describe a fact, a hypothesis, an experimental result or an analysis of results? How confident is the author about the validity of her analyses? These and other types of information, which we collectively term meta-knowledge, can be derived from the context of the event.Results: We have designed an annotation scheme for meta-knowledge enrichment of biomedical event corpora. The scheme is multi-dimensional, in that each event is annotated for 5 different aspects of meta-knowledge that can be derived from the textual context of the event. Textual clues used to determine the values are also annotated. The scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed in the text. We report here on both the main features of the annotation scheme, as well as its application to the GENIA event corpus (1000 abstracts with 36,858 events). High levels of inter-annotator agreement have been achieved, falling in the range of 0.84-0.93 Kappa.Conclusion: By augmenting event annotations with meta-knowledge, more sophisticated IE systems can be trained, which allow interpretative information to be specified as part of the search criteria. This can assist in a number of important tasks, e.g., finding new experimental knowledge to facilitate database curation, enabling textual inference to detect entailments and contradictions, etc. To our knowledge, our scheme is unique within the field with regards to the diversity of meta-knowledge aspects annotated for each event. © 2011 Thompson et al; licensee BioMed Central Ltd

Crossref

E-space: Manchester Metropolitan University's Research Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

Automated Bilingual Linking of Wordnet Senses

Author: A Copestake
A Irvine
M Piasecki
P Pęzik
S Soderland
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Adaptive Information Extraction: Core Technologies For Information Agents

Author: B. Thomas
C. Hsu
J. R. Quinlan
N. Kushmerick
N. Kushmerick
P. Clark
S. Muggleton
S. Soderland
Publication venue: Springer
Publication date: 01/01/2002
Field of study

This paper gives a state of the art overview about machine learning approaches for information extraction from documents based on finite state techniques and relational learning methods related to inductive logic programming

CiteSeerX

Crossref

Activation of porcine hepatic microvascular sinusoidal endothelial cells in pig-to-human liver xenotransplantation

Author: A Di Carlo
A.J Tector
C Soderland
J.I Tchervenkov
J.S Barkun
M Tan
P Metrakos
S Liu
Publication venue: Elsevier BV
Publication date: 01/02/2001
Field of study

Crossref

Sporadic late onset nemaline myopathy responsive to IVIg and immunotherapy

Author: Amiram Katz
Anthony A. Amato
Carl A. Soderland
H. Royden Jones
Margherita Milone
Miruna Segarceanu
Nathan P. Young
Publication venue: Wiley
Publication date: 01/01/2009
Field of study

Crossref

Learning Node Selecting Tree Transducer from Completely Annotated Examples

Author: B. Chidlovskii
C. Higuera de la
C. Kermorvant
E. Gold
F. Coste
F. Neven
K. Lang
K.J. Lang
M. Gruhe
P. Garcia.
S. Soderland
Publication venue: Springer Verlag
Publication date: 01/01/2004
Field of study

Abstract. A base problem in Web information extraction is to find appropriate queries for informative nodes in trees. We propose to learn queries for nodes in trees automatically from examples. We introduce node selecting tree transducer (NSTT) and show how to induce deterministic NSTTs in polynomial time from completely annotated examples. We have implemented learning algorithms for NSTTs, started applying them to Web information extraction, and present first experimental results

CiteSeerX

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Using the exposome to address gene–environment interactions in kidney disease

Author: A Lim
AC Cheung
AP Grollman
CP Wild
LH Lash
MM Niedzwiecki
P Soderland
R Kazancioglu
R Vermeulen
X Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Framework for Collocation Error Correction in Web Pages and Text Documents

Author: Alan Varghese
Aparna S. Varde
Deane P.
Eileen Fitzpatrick
Jing Peng
Marneffe M.
Pradhan A. M.
Ramos M.A.
Soderland S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref