Search CORE

139 research outputs found

Generating name-like vectors for testing large-scale entity resolution

Author: Glonek G.
Herath S.
Roughan M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Entity resolution (ER), the problem of identifying and linking records that belong to the same real-world entities in structured and unstructured data, is a primary task in data integration. Accurate and efficient ER has a major practical impact on various applications across commercial, security and scientific domains. Recently, scalable ER techniques have received enormous attention with the increasing need to combine large-scale datasets. The shortage of training and ground truth data impedes the development and testing of ER algorithms. Good public datasets, especially those containing personal information, are restricted in this area and usually small in size. Due to privacy and confidential issues, testing algorithms or techniques with real datasets is challenging in ER research. Simulation is one technique for generating synthetic datasets that have characteristics similar to those of real data for testing algorithms. Many existing simulation tools in ER lack support for generating large-scale data and have problems in complexity, scalability, and limitations of resampling. In our work, we propose a simple, inexpensive, and fast synthetic data generation tool. Our tool only generates entity names in the first stage, but these are commonly used as identification keys in ER algorithms. We avoid the detail-level simulation of entity names using a simple vector representation that delivers simplicity and efficiency. In this paper, we discuss how to simulate simple vectors that approximate the properties of entity names. We describe the overall construction of the tool based on data analysis of a namespace that contains entity names collected from the actual environment.Samudra Herath, Matthew Roughan and Gary Glone

Adelaide Research & Scholarship

An experimental evaluation of a loop versus a reference design for two-channel microarrays

Author: C. P. Smith
D. D'Alimonte
E. Wit
G. Bucca
G. Hotchkiss
Glonek
J. Rasaiyaah
Kerr
Kerr
N. Cattini
O. de Jesus
P. Kellam
Pathan
R. Khanin
V. Vinciotti
X. Liu
Yang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2004
Field of study

Motivation: Despite theoretical arguments that socalled "loop designs" of two-channel DNA microarray experiments are more efficient, biologists keep on using "reference designs". We describe two sets of microarray experiments with RNA from two different biological systems (TPA-stimulated mammalian cells and Streptomyces coelicor). In each case, both a loop and a reference design were performed using the same RNA preparations with the aim to study their relative efficiency. Results: The results of these experiments show that (1) the loop design attains a much higher precision than the reference design, (2) multiplicative spot effects are a large source of variability, and if they are not accounted for in the mathematical model, for example by taking log-ratios or including spot-effects, then the model will perform poorly. The first result is reinforced by a simulation study. Practical recommendations are given on how simple loop designs can be extended to more realistic experimental designs and how standard statistical methods allow the experimentalist to use and interpret the results from loop designs in practice

University of Groningen

University of Brighton Research Portal

Enlighten

Surrey Research Insight

Brunel University Research Archive

Crossref

Proceedings - University of Groningen

ARTS repository - University of Groningen

University of Groningen Digital Archive

Dissertations of the University of Groningen

A comparison of survival models for prediction of eight-year revision risk following total knee and hip arthroplasty

Author: Cuthbert A.R.
Ellett L.M.K.
Giles L.C.
Glonek G.
Pratt N.L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Background: There is increasing interest in the development and use of clinical prediction models, but a lack of evidence-supported guidance on the merits of diferent modelling approaches. This is especially true for time-to event outcomes, where limited studies have compared the vast number of modelling approaches available. This study compares prediction accuracy and variable importance measures for four modelling approaches in prediction of time-to-revision surgery following total knee arthroplasty (TKA) and total hip arthroplasty (THA). Methods: The study included 321,945 TKA and 151,113 THA procedures performed between 1 January 2003 and 31 December 2017. Accuracy of the Cox model, Weibull parametric model, fexible parametric model, and random survival forest were compared, with patient age, sex, comorbidities, and prosthesis characteristics considered as predictors. Prediction accuracy was assessed using the Index of Prediction Accuracy (IPA), c-index, and smoothed calibration curves. Variable importance rankings from the Cox model and random survival forest were also compared Results: Overall, the Cox and fexible parametric survival models performed best for prediction of both TKA (integrated IPA 0.056 (95% CI [0.054, 0.057]) compared to 0.054 (95% CI [0.053, 0.056]) for the Weibull parametric model), and THA revision. (0.029 95% CI [0.027, 0.030] compared to 0.027 (95% CI [0.025, 0.028]) for the random survival forest). The c-index showed broadly similar discrimination between all modelling approaches. Models were generally well calibrated, but random survival forest underftted the predicted risk of TKA revision compared to regression approaches. The most important predictors of revision were similar in the Cox model and random survival forest for TKA (age, opioid use, and patella resurfacing) and THA (femoral cement, depression, and opioid use). Conclusion: The Cox and fexible parametric models had superior overall performance, although all approaches performed similarly. Notably, this study showed no beneft of a tuned random survival forest over regression models in this setting.Alana R. Cuthbert, Lynne C. Giles, Gary Glonek, Lisa M. Kalisch Ellett, and Nicole L. Prat

Adelaide Research & Scholarship

Alignment of time course gene expression data and the classification of developmentally driven genes with hidden Markov models

Author: Davies C.
Glonek G.
Koch I.
Robinson S.
Thomas M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

BACKGROUND: We consider data from a time course microarray experiment that was conducted on grapevines over the development cycle of the grape berries at two different vineyards in South Australia. Although the underlying biological process of berry development is the same at both vineyards, there are differences in the timing of the development due to local conditions. We aim to align the data from the two vineyards to enable an integrated analysis of the gene expression and use the alignment of the expression profiles to classify likely developmental function. RESULTS: We present a novel alignment method based on hidden Markov models (HMMs) and use the method to align the motivating grapevine data. We show that our alignment method is robust against subsets of profiles that are not suitable for alignment, investigate alignment diagnostics under the model and demonstrate the classification of developmentally driven genes. CONCLUSIONS: The classification of developmentally driven genes both validates that the alignment we obtain is meaningful and also gives new evidence that can be used to identify the role of genes with unknown function. Using our alignment methodology, we find at least 1279 grapevine probe sets with no current annotated function that are likely to be controlled in a developmental manner.Sean Robinson, Garique Glonek, Inge Koch, Mark Thomas, and Christopher Davie

Crossref

Adelaide Research & Scholarship

PubMed Central

Binary Models for Marginal Independence

Author: Anderson T. W
Anderson T. W.
Bergsma W
Bergsma W
Bertsekas D. P
Besag J.
Cox D. R.
Cox D. R.
Dawid A. P.
Drton M.
Edwards D. M
Ekholm A.
Ekholm A.
Erdos P.
Glonek G. F. V.
Hinton G. E.
Hojsgaard S
Kauermann G.
Kauermann G.
Kendler K. S.
Kennes R.
Knuth D. E
Lauritzen S. L.
Levi M.
Mao Y.
Marchetti G. M.
McCullagh P.
McCullagh P.
Moore A.
Pearl J.
Pearl J.
Putnam R.
R Development Core Team
Sztompka P.
Wright S.
Publication venue: 'Wiley'
Publication date: 25/07/2007
Field of study

Log-linear models are a classical tool for the analysis of contingency tables. In particular, the subclass of graphical log-linear models provides a general framework for modelling conditional independences. However, with the exception of special structures, marginal independence hypotheses cannot be accommodated by these traditional models. Focusing on binary variables, we present a model class that provides a framework for modelling marginal independences in contingency tables. The approach taken is graphical and draws on analogies to multivariate Gaussian models for marginal independence. For the graphical model representation we use bi-directed graphs, which are in the tradition of path diagrams. We show how the models can be parameterized in a simple fashion, and how maximum likelihood estimation can be performed using a version of the Iterated Conditional Fitting algorithm. Finally we consider combining these models with symmetry restrictions

arXiv.org e-Print Archive

Crossref

Research Papers in Economics

Meaningful Regression and Association Models for Clustered Ordinal Data

Author: Ekholm A.
Elliott D. S.
Glonek G. F. V.
Hall W.
Johnston L. D.
Lindsey J. K.
Lindsey P. J.
R Development Core Team
Sackett D. L.
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

A constrained polynomial regression procedure for estimating the local False Discovery Rate

Author: A Ploner
Avner Bar-Hen
B Efron
B Efron
B Efron
BM Bolstad
Cyril Dalmasso
G Glonek
I Hedenfalk
J Aubert
JD Storey
JD Storey
JD Storey
JG Liao
M Langaas
MA Newton
NL Johnson
P Broberg
P Broët
PE Gill
Philippe Broët
S Scheid
W Pan
W Pan
X Guo
X Qiu
Y Benjamini
Y Hochberg
Y Hochberg
Y Wang
Y Xie
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background In the context of genomic association studies, for which a large number of statistical tests are performed simultaneously, the local False Discovery Rate (<it>lFDR</it>), which quantifies the evidence of a specific gene association with a clinical or biological variable of interest, is a relevant criterion for taking into account the multiple testing problem. The <it>lFDR </it>not only allows an inference to be made for each gene through its specific value, but also an estimate of Benjamini-Hochberg's False Discovery Rate (<it>FDR</it>) for subsets of genes. Results In the framework of estimating procedures without any distributional assumption under the alternative hypothesis, a new and efficient procedure for estimating the <it>lFDR </it>is described. The results of a simulation study indicated good performances for the proposed estimator in comparison to four published ones. The five different procedures were applied to real datasets. Conclusion A novel and efficient procedure for estimating <it>lFDR </it>was developed and evaluated.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Role of Extramembranous Cytoplasmic Termini in Assembly and Stability of the Tetrameric K+-Channel KcsA

Author: A Dalen Van
A Dalen Van
A Kreusch
A Tinker
ANJA Ridder
B Akitake
DA Doyle
DM Cortes
DM Cortes
E Brink-van der Laan Van den
E Brink-van der Laan Van den
E Perozo
FI Valiyaveetil
FN Barrera
FN Barrera
G Heijne Von
J Skorko-Glonek
JN Sachs
JR Lackowicz
JS Millman
L Heginbotham
M Raja
M Raja
MJ Hope
ML Molina
Mobeen Raja
N Zerangue
R Kusters
S Uysal
SH White
SS Deol
Y Shai
Publication venue: Springer-Verlag
Publication date: 01/01/2010
Field of study

Membrane-active alcohol 2,2,2-trifluoroethanol has been proven to be an attractive tool in the investigation of the intrinsic stability of integral membrane protein complexes by taking K+-channel KcsA as a suitable and representative ion channel. In the present study, the roles of both cytoplasmic N and C termini in channel assembly and stability of KcsA were determined. The N terminus (1–18 residues) slightly increased tetramer stability via electrostatic interactions in the presence of 30 mol.% acidic phosphatidylglycerol (PG) in phosphatidylcholine lipid bilayer. Furthermore, the N terminus was found to be potentially required for efficient channel (re)assembly. In contrast, truncation of the C terminus (125–160 residues) greatly facilitated channel reversibility from either a partially or a completely unfolded state, and this domain was substantially involved in stabilizing the tetramer in either the presence or absence of PG in lipid bilayer. These studies provide new insights into how extramembranous parts play their crucial roles in the assembly and stability of integral membrane protein complexes

Crossref

Springer - Publisher Connector

PubMed Central

Lower age at menarche affects survival in older Australian women: results from the Australian Longitudinal Study of Ageing

Author: A Must
B Efron
BA Stoll
BJ Ellis
BJ Ellis
BK Jacobsen
BK Jacobsen
BK Jacobsen
CI Li
D Apter
D Braithwaite
D Sloboda
DB Dunger
DR Cox
DS Freedman
G Andrews
Gary FV Glonek
H Snieder
I Dos Santos Silva
I Labayen
J Towns
JA Bean
JL Kelsey
JR Ducharme
KA Anstey
KE Remsberg
L Zacharias
LC Giles
LC Giles
LM Marshall
LS Adair
Lynne C Giles
MA Luszcz
Mary A Luszcz
ME Edelston-Pope
MF Folstein
MG Frontini
Michael J Davies
P Finucane
P Royston
PM Grambsch
PM Grambsch
R Cooper
R Lakshman
R Vihko
RI Cui
SE Anderson
StataCorp
TBL Kirkwood
TM Therneau
Vivienne M Moore
Y Feng
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Extent: 10p.Background: While menarche indicates the beginning of a woman's reproductive life, relatively little is known about the association between age at menarche and subsequent morbidity and mortality. We aimed to examine the effect of lower age at menarche on all-cause mortality in older Australian women over 15 years of follow-up. Methods: Data were drawn from the Australian Longitudinal Study of Ageing (n = 1,031 women aged 65-103 years). We estimated the hazard ratio (HR) associated with lower age at menarche using Cox proportional hazards models, and adjusted for a broad range of reproductive, demographic, health and lifestyle covariates. Results: During the follow-up period, 673 women (65%) died (average 7.3 years (SD 4.1) of follow-up for decedents). Women with menses onset < 12 years of age (10.7%; n = 106) had an increased hazard of death over the follow-up period (adjusted HR 1.28; 95%CI 0.99-1.65) compared with women who began menstruating aged ≥ 12 years (89.3%; n = 883). However, when age at menarche was considered as a continuous variable, the adjusted HRs associated with the linear and quadratic terms for age at menarche were not statistically significant at a 5% level of significance (linear HR 0.76; 95%CI 0.56 - 1.04; quadratic HR 1.01; 95%CI 1.00-1.02). Conclusion: Women with lower age at menarche may have reduced survival into old age. These results lend support to the known associations between earlier menarche and risk of metabolic disease in early adulthood. Strategies to minimise earlier menarche, such as promoting healthy weights and minimising family dysfunction during childhood, may also have positive longer-term effects on survival in later life.Lynne C Giles, Gary FV Glonek, Vivienne M Moore, Michael J Davies and Mary A Luszc

Crossref

Adelaide Research & Scholarship

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Flinders Academic Commons