Search CORE

36 research outputs found

A review of spline function procedures in R

Author: Abrahamowicz Michal
Matthias Schmid
Perperoglou Aris
Sauerbrei Willi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2019
Field of study

Background: With progress on both the theoretical and the computational fronts the use of spline modelling has become an established tool in statistical regression analysis. An important issue in spline modelling is the availability of user friendly, well documented software packages. Following the idea of the STRengthening Analytical Thinking for Observational Studies initiative to provide users with guidance documents on the application of statistical methods in observational research, the aim of this article is to provide an overview of the most widely used spline-based techniques and their implementation in R. Methods: In this work, we focus on the R Language for Statistical Computing which has become a hugely popular statistics software. We identified a set of packages that include functions for spline modelling within a regression framework. Using simulated and real data we provide an introduction to spline modelling and an overview of the most popular spline functions. Results: We present a series of simple scenarios of univariate data, where different basis functions are used to identify the correct functional form of an independent variable. Even in simple data, using routines from different packages would lead to different results. Conclusions: This work illustrate challenges that an analyst faces when working with data. Most differences can be attributed to the choice of hyper-parameters rather than the basis used. In fact an experienced user will know how to obtain a reasonable outcome, regardless of the type of spline used. However, many analysts do not have sufficient knowledge to use these powerful tools adequately and will need more guidance

University of Essex Research Repository

Directory of Open Access Journals

Modeling retail browsing sessions and wearables data

Author: Citi Luca
Lausen KB
Medellin-Gasque Rolando
Mullen Anthony
Nordmark Henrik
Perperoglou Aris
Publication venue: KIT Scientific Publishing
Publication date
Field of study

The advent of wearable non-invasive sensors for the consumer market has made it cost-effective to conduct studies that integrate physiological measures such as heart rate into data analysis research. In this paper we investigate the predictive value of heart rate measurements from a commercial wrist wearable device in the context of e-commerce. We look into a dataset comprised of browser-logs and wearables data from 28 individuals in a field experiment over a period of ten days. We are particularly interested in finding predictors for starting a retail session, such as the heart rate at the beginning of a web browsing session. We describe preprocessing tasks applied to the dataset and logistic regression and survival analysis models to retrieve the probability of starting a retail browsing session. Preliminary results show that heart rate has a significant predictive value on starting a retail session if we consider increased and decreased heart rate individual values and the time of day

University of Essex Research Repository

Ensemble of Optimal Trees, Random Forest and Random Projection Ensemble Classification

Author: Adler Werner
Gul Asma
Khan Zardad
Lausen Berthold
Mahmoud Osama
Miftahuddin Miftahuddin
Perperoglou Aris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2020
Field of study

The predictive performance of a random forest ensemble is highly associated with the strength of individual trees and their diversity. Ensemble of a small number of accurate and diverse trees, if prediction accuracy is not compromised, will also reduce computational burden. We investigate the idea of integrating trees that are accurate and diverse. For this purpose, we utilize out-of-bag observations as a validation sample from the training bootstrap samples, to choose the best trees based on their individual performance and then assess these trees for diversity using the Brier score on an independent validation sample. Starting from the first best tree, a tree is selected for the final ensemble if its addition to the forest reduces error of the trees that have already been added. Our approach does not use an implicit dimension reduction for each tree as random project ensemble classification. A total of 35 bench mark problems on classification and regression are used to assess the performance of the proposed method and compare it with random forest, random projection ensemble, node harvest, support vector machine, kNN and classification and regression tree (CART). We compute unexplained variances or classification error rates for all the methods on the corresponding data sets. Our experiments reveal that the size of the ensemble is reduced significantly and better results are obtained in most of the cases. Results of a simulation study are also given where four tree style scenarios are considered to generate data sets with several structures

University of Essex Research Repository

Long-term safety of paclitaxel drug-coated balloon-only angioplasty for de novo coronary artery disease: the SPARTAN DCB study

Author: Eccleshall Simon C.
Gilbert Tim
Gunawardena Tharusha
Maart Clint
Merinopoulos Ioannis
Perperoglou Aris
Richardson Paul
Ryding Alisdair
Sarev Toomas
Sawh Chris
Sreekumar Sulfi
Vassiliou Vassilios S.
Wickramarachchi Upul
Wistow Trevor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/08/2020
Field of study

Objectives: We aimed to investigate long-term survival of paclitaxel DCB for percutaneous coronary intervention (PCI). Background: Safety concerns have been raised over the use of paclitaxel devices for peripheral artery disease recently, following a meta-analysis suggesting increased late mortality. With regard to drug-coated balloon (DCB) angioplasty for coronary artery intervention however, there is limited data to date regarding possible late mortality relating to paclitaxel. Methods: We compared all-cause mortality of patients treated with paclitaxel DCB to those with non-paclitaxel second-generation drug-eluting stents (DES) for stable, de novo coronary artery disease from 1st January 2011 till 31st December 2018. To have homogenous groups allowing data on safety to be interpreted accurately, we excluded patients with previous PCI and patients treated with a combination of both DCB and DES in subsequent PCIs. Data were analysed with Kaplan–Meier curves and Cox regression statistical models. Results: We present 1517 patients; 429 treated with paclitaxel DCB and 1088 treated with DES. On univariate analysis, age, hypercholesterolaemia, hypertension, peripheral vascular disease, prior myocardial infarction, heart failure, smoking, atrial fibrillation, decreasing estimated glomerular filtration rate (eGFR) [and renal failure (eGFR < 45)] were associated with worse survival. DCB intervention showed a non-significant trend towards better prognosis compared to DES (p = 0.08). On multivariable analysis age, decreasing eGFR and smoking associated with worse prognosis. Conclusion: We found no evidence of late mortality associated with DCB angioplasty compared with non-paclitaxel second-generation DES in up to 5 years follow-up. DCB is a safe option for the treatment of de novo coronary artery disease

Spiral - Imperial College Digital Repository

University of East Anglia digital repository

Relationship of cell proliferation (Ki-67) to (99m)Tc-(V)DMSA uptake in breast cancer

Author: Ambela Constantina
Christodoulidou Julie K
Koutsikos John
Lazaris Dimitrios
Louvrou Androniki N
Melissinou Maria J
Papantoniou Vassilios J
Perperoglou Aris
Sotiropoulou Maria G
Souvatzoglou Michael A
Tsiouris Spyridon
Valotassiou Varvara J
Zerva Cherry J
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

INTRODUCTION: The aim of the present study was to identify the relationships between the uptake of radiotracers – namely pentavalent dimercaptosuccinic acid [(V)DMSA] and sestamibi (MIBI) – and the following parameters in primary breast cancer: steroid receptor concentrations (i.e. estrogen receptor [ER] and progesterone receptor [PR]), Ki-67 expression, tumor size, tumor grade, age, and levels of expression of p53 and c-erbB-2. In addition, by multivariate regression analysis, we further isolated those factors with independent associations with (V)DMSA and/or MIBI uptake in primary breast cancer. METHODS: Thirty-four patients with histologically confirmed breast carcinoma underwent preoperative scintimammography with technetium-99m ((99m)Tc)-(V)DMSA and/or (99m)Tc-MIBI in consecutive sessions 10 and 60 min after administration of 925–1110 MBq of each radiotracer. The tumor-to-background ratio was calculated and correlated with the presence of ER, PR, Ki-67, tumor size, tumor grade, p53, and c-erbB-2. ER, PR, p53, and c-erbB-2 were determined immunohistochemically. The analysis included tumor-to-background ratio of (V)DMSA and MIBI uptake as dependent and all of the other parameters as independent variables. RESULTS: Correlation was positive between Ki-67 and (V)DMSA (r = 0.37 at 10 min, P = 0.038; r = 0.42 at 60 min, P = 0.018) and inverse between PR and (V)DMSA uptake (r = -0.46 at 10 min, P = 0.010; r = -0.51 at 60 min, P = 0.003). Multivariate regression analysis demonstrated a positive correlation between Ki-67 and (V)DMSA at 60 min (P = 0.045). Ki-67 was not significantly correlated with MIBI uptake, whereas tumor size was positively correlated with MIBI uptake at 60 min both in univariate (r = 0.45, P = 0.027) and multivariate analysis (P = 0.024). Negative correlations were observed between (V)DMSA uptake and ER, as well as between ER/PR and MIBI uptake, but these were not significant. CONCLUSION: Ki-67 appears to represent the major independent factor affecting (V)DMSA uptake in breast cancer. Tumor size was the only independent parameter influencing MIBI uptake in breast cancer. (V)DMSA appears to have an advantage over MIBI in that it can be used to visualize tumors with intense proliferative activity, and thus it can identify those tumors that are more aggressive

University of Essex Research Repository

Springer - Publisher Connector

PubMed Central

Atrial fibrillation in embolic stroke of undetermined source: Role of advanced imaging of left atrial function

Author: Bhalraam U.
Chattopadhyay Rahul
Chousou Panagiota Anna
Khadjooi Kayvan
Murherjee Trisha
Perperoglou Aris
Potter John
Pugh Peter John
Ring Liam
Tsampasian Vasiliki
Vassiliou Vassilios S.
Warburton Elizabeth A.
Publication venue
Publication date: 11/07/2023
Field of study

Background: Atrial fibrillation (AF) is detected in over 30% of patients following an embolic stroke of undetermined source (ESUS) when monitored with an implantable loop recorder (ILR). Identifying AF in ESUS survivors has significant therapeutic implications and AF risk is essential to guide screening with long-term monitoring. The present study aimed to establish the role of Left Atrial (LA) function in subsequent AF identification and develop a risk model for AF in ESUS. Methods: We conducted a single-centre retrospective case-control study including all patients with ESUS referred to our institution for ILR implantation from December 2009 to September 2019. We recorded clinical variables at baseline and analyzed transthoracic echocardiograms in sinus rhythm. Univariate and multivariable analyses were performed to inform variables associated with AF. Lasso regression analysis was used to develop a risk prediction model for AF. The risk model was internally validated using bootstrapping. Results: Three hundred and twenty-three patients with ESUS underwent ILR implantation. In the ESUS population, 293 had a stroke, whereas 30 had suffered a TIA as adjudicated by a senior stroke physician. AF of any duration was detected in 47.1%. Mean follow-up was 710 days. Following lasso regression with backward elimination, we combined increasing lateral PA (the time interval from the beginning of p wave on surface electrocardiogram to the beginning of A’ wave on pulsed wave tissue Doppler of the lateral mitral annulus) (OR 1.011), increasing Age (OR 1.035), higher diastolic blood pressure (DBP) (OR 1.027) and abnormal LA reservoir Strain (OR 0.973) into a new PADS score. The probability of identifying AF can be estimated using the formula: Model discrimination was good (AUC 0.72). The PADS score was internally validated using bootstrapping with 1000 samples of 150 patients showing consistent results with an AUC of 0.73. Conclusions: The novel PADS score can identify the risk of AF on prolonged monitoring with ILR following ESUS and should be considered a dedicated risk-stratification tool for decision-making regarding the screening strategy for AF in stroke

University of East Anglia digital repository

MIDWALL FIBROSIS AND LONG-TERM OUTCOME IN PATIENTS WITH AORTIC STENOSIS

Author: Ali Aamir
Alpendurada Francisco
Baksi John
Chin Calvin
Dweck Marc
Jabbour Andrew
Joshi Sanjiv
Kilner Philip
Mohiaddin Raad
Murigu Timothy
Newby David
Nyktari Eva
Pennell Dudley
Perperoglou Aris
Prasad Sanjay
Raphael Claire
Vassiliou Vass
Wage Ricardo
Publication venue: 'Elsevier BV'
Publication date: 27/03/2014
Field of study

University of Essex Research Repository

Elsevier - Publisher Connector

Crossref

Delays in Leniency Application: Is There Really a Race to the Enforcer's Door?

This paper studies cartels’ strategic behavior in delaying leniency applications, a take-up decision that has been ignored in the previous literature. Using European Commission decisions issued over a 16-year span, we show, contrary to common beliefs and the existing literature, that conspirators often apply for leniency long after a cartel collapses. We estimate hazard and probit models to study the determinants of leniency-application delays. Statistical tests find that delays are symmetrically affected by antitrust policies and macroeconomic fluctuations. Our results shed light on the design of enforcement programs against cartels and other forms of conspiracy

Crossref

Open Access LMU

Tilburg University Repository

Ensemble of a subset of kNN classifiers

Author: A Karatzoglou
Aris Perperoglou
Asma Gul
Berthold Lausen
C Müssel
D Mease
DF Nettleton
E Bauer
EW Steyerberg
J Hernández-Orallo
J Kruppa
L Breiman
L Lausser
Miftahuddin Miftahuddin
O Mahmoud
Osama Mahmoud
P Hall
P Melville
R Barandela
R Maclin
RJ Samworth
S Li
T Cover
T Hothorn
T Hothorn
T Hothorn
T Hothorn
T Khoshgoftaar
Werner Adler
Z Liu
Zardad Khan
ZH Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines

University of Essex Research Repository

Crossref

Springer - Publisher Connector

Explore Bristol Research

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score

Author: A Kikuchi
A Statnikov
A Ultsch
Andrew Harrison
Aris Perperoglou
Asma Gul
B Lausen
Berthold Lausen
C Cortes
C Ding
C Ma
C Müssel
C Zou
D Apiletti
D Apiletti
DA Notterman
DeAndresSA Díaz‐Uriarte R
DG Altman
E Baralis
GJ Gordon
H Peng
H‐C Liu
J Fan
J Fan
J Lu
K‐H Chen
L Breiman
L Breiman
L Lausser
M Dramiński
M Marczyk
Metodi V Metodiev
N De Jay
Osama Mahmoud
P Alhopuro
P Laiho
RN Jorissen
RS Croner
RS Croner
S Chiaretti
S Michiels
T Cover
T Jirapech‐Umpai
TR Golub
VG Tusher
W Talloen
Y Saeys
Y Su
Zardad Khan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task.Results: We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.Conclusions: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes

University of Essex Research Repository

Crossref

Springer - Publisher Connector

PubMed Central

Explore Bristol Research