77 research outputs found

    A special case of reduced rank models for identification and modelling of time varying effects in survival analysis

    Get PDF
    Flexible survival models are in need when modelling data from long term follow-up studies. In many cases, the assumption of proportionality imposed by a Cox model will not be valid. Instead, a model that can identify time varying effects of fixed covariates can be used. Although there are several approaches that deal with this problem, it is not always straightforward how to choose which covariates should be modelled having time varying effects and which not. At the same time, it is up to the researcher to define appropriate time functions that describe the dynamic pattern of the effects. In this work, we suggest a model that can deal with both fixed and time varying effects and uses simple hypotheses tests to distinguish which covariates do have dynamic effects. The model is an extension of the parsimonious reduced rank model of rank 1. As such, the number of parameters is kept low, and thus, a flexible set of time functions, such as b-splines, can be used. The basic theory is illustrated along with an efficient fitting algorithm. The proposed method is applied to a dataset of breast cancer patients and compared with a multivariate fractional polynomials approach for modelling time-varying effects. Copyright © 2016 John Wiley & Sons, Ltd

    Modelling long term survival with non-proportional hazards

    Get PDF
    In this work I consider models for survival data when the assumption of proportionality does not hold. The thesis consists of an Introduction, five papers, a Discussion and an Appendix. The Introduction presents technical information about the Cox model and introduces the ideas behind the extensions of the model proposed later on. In Chapter 2, reduced-rank methods for modelling non-proportional hazards are presented while Chapter 3 presents an algorithm for estimating Cox models with time varying effects of the covariates. The next Chapter deals with the gamma frailty (Burr) model and discusses alternative models with time dependent frailties. In Chapter 5 models with time varying effects of the covariates, frailty models and cure rate models are considered. The usefulness of each one of these models is discussed and their results are compared. The sixth Chapter of the thesis discusses ways of dealing with overdispersion when using generalized linear models. The Discussion is about future directions of the research presented in this thesis. Finally, there is an Appendix about the use of coxvc, a package written in R for fitting Cox models with time varying effects of the covariates.ZonMW project (ZON 912.02.015)UBL - phd migration 201

    Two dimensional smoothing via an optimised Whittaker smoother

    Get PDF
    Background In many applications where moderate to large datasets are used, plotting relationships between pairs of variables can be problematic. A large number of observations will produce a scatter-plot which is difficult to investigate due to a high concentration of points on a simple graph. In this article we review the Whittaker smoother for enhancing scatter-plots and smoothing data in two dimensions. To optimise the behaviour of the smoother an algorithm is introduced, which is easy to programme and computationally efficient. Results The methods are illustrated using a simple dataset and simulations in two dimensions. Additionally, a noisy mammography is analysed. When smoothing scatterplots the Whittaker smoother is a valuable tool that produces enhanced images that are not distorted by the large number of points. The methods is also useful for sharpening patterns or removing noise in distorted images. Conclusion The Whittaker smoother can be a valuable tool in producing better visualisations of big data or filter distorted images. The suggested optimisation method is easy to programme and can be applied with low computational cost

    An Ensemble of Optimal Trees for Classification and Regression (OTE)

    Get PDF
    Predictive performance of a random forest ensemble is highly associated with the strength of individual trees and their diversity. Ensemble of a small number of accurate and diverse trees, if prediction accuracy is not compromised, will also reduce computational burden. We investigate the idea of integrating trees that are accurate and diverse. For this purpose, we utilize out-of-bag observation as validation sample from the training bootstrap samples to choose the best trees based on their individual performance and then assess these trees for diversity using Brier score. Starting from the first best tree, a tree is selected for the final ensemble if its addition to the forest reduces error of the trees that have already been added. A total of 35 bench mark problems on classification and regression are used to assess the performance of the proposed method and compare it with kNN, tree, random forest, node harvest and support vector machine. We compute unexplained variances and classification error rates for all the methods on the corresponding data sets. Our experiments reveal that the size of the ensemble is reduced significantly and better results are obtained in most of the cases. For further verification, a simulation study is also given where four tree style scenarios are considered to generate data sets with several structures

    Modeling retail browsing sessions and wearables data

    Get PDF
    The advent of wearable non-invasive sensors for the consumer market has made it cost-effective to conduct studies that integrate physiological measures such as heart rate into data analysis research. In this paper we investigate the predictive value of heart rate measurements from a commercial wrist wearable device in the context of e-commerce. We look into a dataset comprised of browser-logs and wearables data from 28 individuals in a field experiment over a period of ten days. We are particularly interested in finding predictors for starting a retail session, such as the heart rate at the beginning of a web browsing session. We describe preprocessing tasks applied to the dataset and logistic regression and survival analysis models to retrieve the probability of starting a retail browsing session. Preliminary results show that heart rate has a significant predictive value on starting a retail session if we consider increased and decreased heart rate individual values and the time of day

    A review of spline function procedures in R

    Get PDF
    Background: With progress on both the theoretical and the computational fronts the use of spline modelling has become an established tool in statistical regression analysis. An important issue in spline modelling is the availability of user friendly, well documented software packages. Following the idea of the STRengthening Analytical Thinking for Observational Studies initiative to provide users with guidance documents on the application of statistical methods in observational research, the aim of this article is to provide an overview of the most widely used spline-based techniques and their implementation in R. Methods: In this work, we focus on the R Language for Statistical Computing which has become a hugely popular statistics software. We identified a set of packages that include functions for spline modelling within a regression framework. Using simulated and real data we provide an introduction to spline modelling and an overview of the most popular spline functions. Results: We present a series of simple scenarios of univariate data, where different basis functions are used to identify the correct functional form of an independent variable. Even in simple data, using routines from different packages would lead to different results. Conclusions: This work illustrate challenges that an analyst faces when working with data. Most differences can be attributed to the choice of hyper-parameters rather than the basis used. In fact an experienced user will know how to obtain a reasonable outcome, regardless of the type of spline used. However, many analysts do not have sufficient knowledge to use these powerful tools adequately and will need more guidance

    Atrial fibrillation in embolic stroke of undetermined source: role of advanced imaging of left atrial function

    Get PDF
    \ua9 The Author(s) 2023. Published by Oxford University Press on behalf of the European Society of Cardiology. AIMS: Atrial fibrillation (AF) is detected in over 30% of patients following an embolic stroke of undetermined source (ESUS) when monitored with an implantable loop recorder (ILR). Identifying AF in ESUS survivors has significant therapeutic implications, and AF risk is essential to guide screening with long-term monitoring. The present study aimed to establish the role of left atrial (LA) function in subsequent AF identification and develop a risk model for AF in ESUS. METHODS AND RESULTS: We conducted a single-centre retrospective case-control study including all patients with ESUS referred to our institution for ILR implantation from December 2009 to September 2019. We recorded clinical variables at baseline and analysed transthoracic echocardiograms in sinus rhythm. Univariate and multivariable analyses were performed to inform variables associated with AF. Lasso regression analysis was used to develop a risk prediction model for AF. The risk model was internally validated using bootstrapping. Three hundred and twenty-three patients with ESUS underwent ILR implantation. In the ESUS population, 293 had a stroke, whereas 30 had suffered a transient ischaemic attack as adjudicated by a senior stroke physician. Atrial fibrillation of any duration was detected in 47.1%. The mean follow-up was 710 days. Following lasso regression with backwards elimination, we combined increasing lateral PA (the time interval from the beginning of the P wave on the surface electrocardiogram to the beginning of the A\u27 wave on pulsed wave tissue Doppler of the lateral mitral annulus) [odds ratio (OR) 1.011], increasing Age (OR 1.035), higher Diastolic blood pressure (OR 1.027), and abnormal LA reservoir Strain (OR 0.973) into a new PADS score. The probability of identifying AF can be estimated using the formula. Model discrimination was good [area under the curve (AUC) 0.72]. The PADS score was internally validated using bootstrapping with 1000 samples of 150 patients showing consistent results with an AUC of 0.73. CONCLUSION: The novel PADS score can identify the risk of AF on prolonged monitoring with ILR following ESUS and should be considered a dedicated risk stratification tool for decision-making regarding the screening strategy for AF in stroke.One-third of patients with a type of stroke called embolic stroke of undetermined source (ESUS) also have a heart condition called atrial fibrillation (AF), which increases their risk of having another stroke. However, we do not know why some patients with ESUS develop AF. To figure this out, we studied 323 patients with ESUS and used a special device to monitor their heart rhythm continuously for up to 3 years, an implantable loop recorder. We also looked at their medical history, performed a heart ultrasound, and identified some factors that increase the risk of identifying AF in the future. Factors associated with future AF include older age, higher diastolic blood pressure, and problems with the co-ordination and function of the upper left chamber of the heart called the left atrium.Based on these factors, we created a new scoring system that can identify patients who are at higher risk of developing AF better than the current scoring systems, the PADS score. This can potentially help doctors provide more targeted and effective treatment to these patients, ultimately aiming to reduce their risk of having another stroke

    State of the art in selection of variables and functional forms in multivariable analysis-outstanding issues

    Get PDF
    Background: How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc ‘traditional’ approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years ago. More recently, many alternative approaches to address these two challenges have been proposed, but knowledge of their properties and meaningful comparisons between them are scarce. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, many outstanding issues in multivariable modelling remain. Our main aims are to identify and illustrate such gaps in the literature and present them at a moderate technical level to the wide community of practitioners, researchers and students of statistics. Methods: We briefly discuss general issues in building descriptive regression models, strategies for variable selection, different ways of choosing functional forms for continuous variables and methods for combining the selection of variables and functions. We discuss two examples, taken from the medical literature, to illustrate problems in the practice of modelling. Results: Our overview revealed that there is not yet enough evidence on which to base recommendations for the selection of variables and functional forms in multivariable analysis. Such evidence may come from comparisons between alternative methods. In particular, we highlight seven important topics that require further investigation and make suggestions for the direction of further research. Conclusions: Selection of variables and of functional forms are important topics in multivariable analysis. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, further comparative research is required

    Ensemble of Optimal Trees, Random Forest and Random Projection Ensemble Classification

    Get PDF
    The predictive performance of a random forest ensemble is highly associated with the strength of individual trees and their diversity. Ensemble of a small number of accurate and diverse trees, if prediction accuracy is not compromised, will also reduce computational burden. We investigate the idea of integrating trees that are accurate and diverse. For this purpose, we utilize out-of-bag observations as a validation sample from the training bootstrap samples, to choose the best trees based on their individual performance and then assess these trees for diversity using the Brier score on an independent validation sample. Starting from the first best tree, a tree is selected for the final ensemble if its addition to the forest reduces error of the trees that have already been added. Our approach does not use an implicit dimension reduction for each tree as random project ensemble classification. A total of 35 bench mark problems on classification and regression are used to assess the performance of the proposed method and compare it with random forest, random projection ensemble, node harvest, support vector machine, kNN and classification and regression tree (CART). We compute unexplained variances or classification error rates for all the methods on the corresponding data sets. Our experiments reveal that the size of the ensemble is reduced significantly and better results are obtained in most of the cases. Results of a simulation study are also given where four tree style scenarios are considered to generate data sets with several structures

    Long-term safety of paclitaxel drug-coated balloon-only angioplasty for de novo coronary artery disease: the SPARTAN DCB study

    Get PDF
    Objectives: We aimed to investigate long-term survival of paclitaxel DCB for percutaneous coronary intervention (PCI). Background: Safety concerns have been raised over the use of paclitaxel devices for peripheral artery disease recently, following a meta-analysis suggesting increased late mortality. With regard to drug-coated balloon (DCB) angioplasty for coronary artery intervention however, there is limited data to date regarding possible late mortality relating to paclitaxel. Methods: We compared all-cause mortality of patients treated with paclitaxel DCB to those with non-paclitaxel second-generation drug-eluting stents (DES) for stable, de novo coronary artery disease from 1st January 2011 till 31st December 2018. To have homogenous groups allowing data on safety to be interpreted accurately, we excluded patients with previous PCI and patients treated with a combination of both DCB and DES in subsequent PCIs. Data were analysed with Kaplan–Meier curves and Cox regression statistical models. Results: We present 1517 patients; 429 treated with paclitaxel DCB and 1088 treated with DES. On univariate analysis, age, hypercholesterolaemia, hypertension, peripheral vascular disease, prior myocardial infarction, heart failure, smoking, atrial fibrillation, decreasing estimated glomerular filtration rate (eGFR) [and renal failure (eGFR < 45)] were associated with worse survival. DCB intervention showed a non-significant trend towards better prognosis compared to DES (p = 0.08). On multivariable analysis age, decreasing eGFR and smoking associated with worse prognosis. Conclusion: We found no evidence of late mortality associated with DCB angioplasty compared with non-paclitaxel second-generation DES in up to 5 years follow-up. DCB is a safe option for the treatment of de novo coronary artery disease
    • …
    corecore