Search CORE

174 research outputs found

A Bayesian Approach to Graphical Record Linkage and Deduplication

Author: Fienberg SE
Hall R
Steorts RC
Publication venue
Publication date: 01/10/2016
Field of study

© 2016 American Statistical Association.We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate transitive linkage probabilities across records (and represent this visually), and propagate the uncertainty of record linkage into later analyses. Our method makes it particularly easy to integrate record linkage with post-processing procedures such as logistic regression, capture–recapture, etc. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously record linkage approaches, despite the high-dimensional parameter space. We illustrate our method using longitudinal data from the National Long Term Care Survey and with data from the Italian Survey on Household and Wealth, where we assess the accuracy of our method and show it to be better in terms of error rates and empirical scalability than other approaches in the literature. Supplementary materials for this article are available online

DukeSpace

SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication

Author: Fienberg SE
Hall R
Steorts RC
Publication venue
Publication date
Field of study

We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation is to represent the pattern of links between records as a {\em bipartite} graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible new representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate

k

-way posterior probabilities of matches across records, and propagate the uncertainty of record linkage into later analyses. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously proposed methods of record linkage, despite the high dimensional parameter space. We assess our results on real and simulated data

DukeSpace

Answering two biological questions with a latent class model via MCMC applied to capture-recapture data.

Author: A Agresti
A Biggeri
AD Sokal
B Lindsay
BJ Castledine
CJ Schwarz
D Madigan
EI George
F Bartolucci
G Bruno
JN Darroch
JN Darroch
L Sanathanan
L Tardella
L Tierney
L Tierney
L Tierney
PH Peskun
PJ Green
PJ Green
PJ Smith
PSF Yip
PSF Yip
R King
RM Cormack
S Basu
SE Fienberg
SE Fienberg
ZE Schnabel
Publication venue: place:NORWELL, MA
Publication date: 01/01/2003
Field of study

Crossref

Archivio istituzionale della ricerca - Università di Macerata

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

Author: A Agresti
C Morris
CJ Geyer
F Critchley
F Nielsen
GP Steck
K Anaya-Izquierdo
K Anaya-Izquierdo
L Holst
M Liu
OE Barndorff-Nielsen
S-I Amari
SE Fienberg
SL Lauritzen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/11/2015
Field of study

We introduce a new approach to goodness-of-fit testing in the high dimensional, sparse extended multinomial context. The paper takes a computational information geometric approach, extending classical higher order asymptotic theory. We show why the Wald – equivalently, the Pearson X2 and score statistics – are unworkable in this context, but that the deviance has a simple, accurate and tractable sampling distribution even for moderate sample sizes. Issues of uniformity of asymptotic approximations across model space are discussed. A variety of important applications and extensions are noted

Crossref

Open Research Online (The Open University)

The interplay of microscopic and mesoscopic structure in complex networks

Author: A Davis
A Hamosh
AL Barabási
AL Barabási
AL Fred
authors Various
B Karrer
BL Chen
C Kemp
C Song
CJ Honey
D MacKay
David Saad
EM Airoldi
G Bianconi
H Jeong
JA Dunne
JJ Daudin
Jörg Reichardt
K Nowicki
KI Goh
LC Freeman
M Girvan
M Morup
M Newman
M Reigl
MS Handcock
Olaf Sporns
P Doreian
P Holland
PD Hoff
PJ Bickel
PN Krivitsky
PW Holland
R Guimera
R Guimerà
R Milo
R Milo
R Sharan
Roberto Alamino
S Fortunato
S Wasserman
SE Fienberg
TA Snijders
Y Artzy-Randrup
YJ Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 20/12/2010
Field of study

Not all nodes in a network are created equal. Differences and similarities exist at both individual node and group levels. Disentangling single node from group properties is crucial for network modeling and structural inference. Based on unbiased generative probabilistic exponential random graph models and employing distributive message passing techniques, we present an efficient algorithm that allows one to separate the contributions of individual nodes and groups of nodes to the network structure. This leads to improved detection accuracy of latent class structure in real world data sets compared to models that focus on group structure alone. Furthermore, the inclusion of hitherto neglected group specific effects in models used to assess the statistical significance of small subgraph (motif) distributions in networks may be sufficient to explain most of the observed statistics. We show the predictive power of such generative models in forecasting putative gene-disease associations in the Online Mendelian Inheritance in Man (OMIM) database. The approach is suitable for both directed and undirected uni-partite as well as for bipartite networks

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Aston Publications Explorer

Online-Publikations-Server der Universität Würzburg

A Semantic Reasoning Method Towards Ontological Model for Automated Learning Analysis

Author: A Lumpe
Davide Barbieri
Diogo R. Ferreira
J Lehmann
Joshua B. Tenenbaum
K Okoye
LH Thom
M Yarandi
SE Fienberg
TH Nguyen
TL Griffiths
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Semantic reasoning can help solve the problem of regulating the evolving and static measures of knowledge at theoretical and technological levels. The technique has been proven to enhance the capability of process models by making inferences, retaining and applying what they have learned as well as discovery of new processes. The work in this paper propose a semantic rule-based approach directed towards discovering learners interaction patterns within a learning knowledge base, and then respond by making decision based on adaptive rules centred on captured user profiles. The method applies semantic rules and description logic queries to build ontology model capable of automatically computing the various learning activities within a Learning Knowledge-Base, and to check the consistency of learning object/data types. The approach is grounded on inductive and deductive logic descriptions that allows the use of a Reasoner to check that all definitions within the learning model are consistent and can also recognise which concepts that fit within each defined class. Inductive reasoning is practically applied in order to discover sets of inferred learner categories, while deductive approach is used to prove and enhance the discovered rules and logic expressions. Thus, this work applies effective reasoning methods to make inferences over a Learning Process Knowledge-Base that leads to automated discovery of learning patterns/behaviour

UEL Research Repository at University of East London

Crossref

Birmingham City University Open Access Repository

BCU Open Access

Web Queries as a Source for Syndromic Surveillance

Author: Anette Hulth
Annika Linde
D Das
E Andersson
E Rolland
F Mostashari
G Smith
Gustaf Rydevik
HA Johnson
J Ginsberg
Joel Mark Montgomery
JS Lombardo
KH Bork
L Eriksson
L Josseran
P Armitage
PM Polgreen
R Heffernan
R Wehrens
R Wehrens
SE Fienberg
V Jormanainen
WR Hogan
WW Chapman
X Zeng
Publication venue: Public Library of Science
Publication date: 06/02/2009
Field of study

In the field of syndromic surveillance, various sources are exploited for outbreak detection, monitoring and prediction. This paper describes a study on queries submitted to a medical web site, with influenza as a case study. The hypothesis of the work was that queries on influenza and influenza-like illness would provide a basis for the estimation of the timing of the peak and the intensity of the yearly influenza outbreaks that would be as good as the existing laboratory and sentinel surveillance. We calculated the occurrence of various queries related to influenza from search logs submitted to a Swedish medical web site for two influenza seasons. These figures were subsequently used to generate two models, one to estimate the number of laboratory verified influenza cases and one to estimate the proportion of patients with influenza-like illness reported by selected General Practitioners in Sweden. We applied an approach designed for highly correlated data, partial least squares regression. In our work, we found that certain web queries on influenza follow the same pattern as that obtained by the two other surveillance systems for influenza epidemics, and that they have equal power for the estimation of the influenza burden in society. Web queries give a unique access to ill individuals who are not (yet) seeking care. This paper shows the potential of web queries as an accurate, cheap and labour extensive source for syndromic surveillance

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Tamkang University Institutional Repository

Long-term declines in ADLs, IADLs, and mobility among older Medicare beneficiaries

Abstract Background Most prior studies have focused on short-term (≤ 2 years) functional declines. But those studies cannot address aging effects inasmuch as all participants have aged the same amount. Therefore, the authors studied the extent of long-term functional decline in older Medicare beneficiaries who were followed for varying time lengths, and the authors also identified the risk factors associated with those declines. Methods The analytic sample included 5,871 self- or proxy-respondents who had complete baseline and follow-up survey data that could be linked to their Medicare claims for 1993-2007. Functional status was assessed using activities of daily living (ADLs), instrumental ADLs (IADLs), and mobility limitations, with declines defined as the development of two of more new difficulties. Multiple logistic regression analysis was used to focus on the associations involving respondent status, health lifestyle, continuity of care, managed care status, health shocks, and terminal drop. Results The average amount of time between the first and final interviews was 8.0 years. Declines were observed for 36.6% on ADL abilities, 32.3% on IADL abilities, and 30.9% on mobility abilities. Functional decline was more likely to occur when proxy-reports were used, and the effects of baseline function on decline were reduced when proxy-reports were used. Engaging in vigorous physical activity consistently and substantially protected against functional decline, whereas obesity, cigarette smoking, and alcohol consumption were only associated with mobility declines. Post-baseline hospitalizations were the most robust predictors of functional decline, exhibiting a dose-response effect such that the greater the average annual number of hospital episodes, the greater the likelihood of functional status decline. Participants whose final interview preceded their death by one year or less had substantially greater odds of functional status decline. Conclusions Both the additive and interactive (with functional status) effects of respondent status should be taken into consideration whenever proxy-reports are used. Encouraging exercise could broadly reduce the risk of functional decline across all three outcomes, although interventions encouraging weight reduction and smoking cessation would only affect mobility declines. Reducing hospitalization and re-hospitalization rates could also broadly reduce the risk of functional decline across all three outcomes.</p

Crossref

Directory of Open Access Journals

PubMed Central

The Role of Non-Epistemic Values in Engineering Models

Author: D Steel
E McMullin
I Poel van de
M Dorato
M Suárez
Martin Peterson
R Rudner
SE Fienberg
Sven Diekmann
TS Kuhn
TS Kuhn
WE Walker
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The Hierarchical Age-Period-Cohort model: Why does it find the results that it finds?

Author: A Bell
A Bell
A Bell
A Bell
A Bell
A Bell
A Bell
Andrew Bell
B Pelzer
DJ Spiegelhalter
E Suzuki
EN Reither
EN Reither
EN Reither
J Rasbash
J Rasbash
Kelvyn Jones
L Chauvel
L Chauvel
L Linek
L Luo
L Luo
L Luo
L Luo
M Grotenhuis Te
R Dassonneville
SE Fienberg
WJ Browne
Y Yang
Y Yang
Y Yang
Y Yang
Y Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

It is claimed the hierarchical-age–period–cohort (HAPC) model solves the age–period–cohort (APC) identification problem. However, this is debateable; simulations show situations where the model produces incorrect results, countered by proponents of the model arguing those simulations are not relevant to real-life scenarios. This paper moves beyond questioning whether the HAPC model works, to why it produces the results it does. We argue HAPC estimates are the result not of the distinctive substantive APC processes occurring in the dataset, but are primarily an artefact of the data structure—that is, the way the data has been collected. Were the data collected differently, the results produced would be different. This is illustrated both with simulations and real data, the latter by taking a variety of samples from the National Health Interview Survey (NHIS) data used by Reither et al. (Soc Sci Med 69(10):1439–1448, 2009) in their HAPC study of obesity. When a sample based on a small range of cohorts is taken, such that the period range is much greater than the cohort range, the results produced are very different to those produced when cohort groups span a much wider range than periods, as is structurally the case with repeated cross-sectional data. The paper also addresses the latest defence of the HAPC model by its proponents (Reither et al. in Soc Sci Med 145:125–128, 2015a). The results lend further support to the view that the HAPC model is not able to accurately discern APC effects, and should be used with caution when there appear to be period or cohort near-linear trends

Crossref

Springer - Publisher Connector

White Rose Research Online

Explore Bristol Research