32 research outputs found

    Modelling long term survival with non-proportional hazards

    Get PDF
    In this work I consider models for survival data when the assumption of proportionality does not hold. The thesis consists of an Introduction, five papers, a Discussion and an Appendix. The Introduction presents technical information about the Cox model and introduces the ideas behind the extensions of the model proposed later on. In Chapter 2, reduced-rank methods for modelling non-proportional hazards are presented while Chapter 3 presents an algorithm for estimating Cox models with time varying effects of the covariates. The next Chapter deals with the gamma frailty (Burr) model and discusses alternative models with time dependent frailties. In Chapter 5 models with time varying effects of the covariates, frailty models and cure rate models are considered. The usefulness of each one of these models is discussed and their results are compared. The sixth Chapter of the thesis discusses ways of dealing with overdispersion when using generalized linear models. The Discussion is about future directions of the research presented in this thesis. Finally, there is an Appendix about the use of coxvc, a package written in R for fitting Cox models with time varying effects of the covariates.ZonMW project (ZON 912.02.015)UBL - phd migration 201

    Two dimensional smoothing via an optimised Whittaker smoother

    Get PDF
    Background In many applications where moderate to large datasets are used, plotting relationships between pairs of variables can be problematic. A large number of observations will produce a scatter-plot which is difficult to investigate due to a high concentration of points on a simple graph. In this article we review the Whittaker smoother for enhancing scatter-plots and smoothing data in two dimensions. To optimise the behaviour of the smoother an algorithm is introduced, which is easy to programme and computationally efficient. Results The methods are illustrated using a simple dataset and simulations in two dimensions. Additionally, a noisy mammography is analysed. When smoothing scatterplots the Whittaker smoother is a valuable tool that produces enhanced images that are not distorted by the large number of points. The methods is also useful for sharpening patterns or removing noise in distorted images. Conclusion The Whittaker smoother can be a valuable tool in producing better visualisations of big data or filter distorted images. The suggested optimisation method is easy to programme and can be applied with low computational cost

    An Ensemble of Optimal Trees for Classification and Regression (OTE)

    Get PDF
    Predictive performance of a random forest ensemble is highly associated with the strength of individual trees and their diversity. Ensemble of a small number of accurate and diverse trees, if prediction accuracy is not compromised, will also reduce computational burden. We investigate the idea of integrating trees that are accurate and diverse. For this purpose, we utilize out-of-bag observation as validation sample from the training bootstrap samples to choose the best trees based on their individual performance and then assess these trees for diversity using Brier score. Starting from the first best tree, a tree is selected for the final ensemble if its addition to the forest reduces error of the trees that have already been added. A total of 35 bench mark problems on classification and regression are used to assess the performance of the proposed method and compare it with kNN, tree, random forest, node harvest and support vector machine. We compute unexplained variances and classification error rates for all the methods on the corresponding data sets. Our experiments reveal that the size of the ensemble is reduced significantly and better results are obtained in most of the cases. For further verification, a simulation study is also given where four tree style scenarios are considered to generate data sets with several structures

    Atrial fibrillation in embolic stroke of undetermined source: role of advanced imaging of left atrial function

    Get PDF
    \ua9 The Author(s) 2023. Published by Oxford University Press on behalf of the European Society of Cardiology. AIMS: Atrial fibrillation (AF) is detected in over 30% of patients following an embolic stroke of undetermined source (ESUS) when monitored with an implantable loop recorder (ILR). Identifying AF in ESUS survivors has significant therapeutic implications, and AF risk is essential to guide screening with long-term monitoring. The present study aimed to establish the role of left atrial (LA) function in subsequent AF identification and develop a risk model for AF in ESUS. METHODS AND RESULTS: We conducted a single-centre retrospective case-control study including all patients with ESUS referred to our institution for ILR implantation from December 2009 to September 2019. We recorded clinical variables at baseline and analysed transthoracic echocardiograms in sinus rhythm. Univariate and multivariable analyses were performed to inform variables associated with AF. Lasso regression analysis was used to develop a risk prediction model for AF. The risk model was internally validated using bootstrapping. Three hundred and twenty-three patients with ESUS underwent ILR implantation. In the ESUS population, 293 had a stroke, whereas 30 had suffered a transient ischaemic attack as adjudicated by a senior stroke physician. Atrial fibrillation of any duration was detected in 47.1%. The mean follow-up was 710 days. Following lasso regression with backwards elimination, we combined increasing lateral PA (the time interval from the beginning of the P wave on the surface electrocardiogram to the beginning of the A\u27 wave on pulsed wave tissue Doppler of the lateral mitral annulus) [odds ratio (OR) 1.011], increasing Age (OR 1.035), higher Diastolic blood pressure (OR 1.027), and abnormal LA reservoir Strain (OR 0.973) into a new PADS score. The probability of identifying AF can be estimated using the formula. Model discrimination was good [area under the curve (AUC) 0.72]. The PADS score was internally validated using bootstrapping with 1000 samples of 150 patients showing consistent results with an AUC of 0.73. CONCLUSION: The novel PADS score can identify the risk of AF on prolonged monitoring with ILR following ESUS and should be considered a dedicated risk stratification tool for decision-making regarding the screening strategy for AF in stroke.One-third of patients with a type of stroke called embolic stroke of undetermined source (ESUS) also have a heart condition called atrial fibrillation (AF), which increases their risk of having another stroke. However, we do not know why some patients with ESUS develop AF. To figure this out, we studied 323 patients with ESUS and used a special device to monitor their heart rhythm continuously for up to 3 years, an implantable loop recorder. We also looked at their medical history, performed a heart ultrasound, and identified some factors that increase the risk of identifying AF in the future. Factors associated with future AF include older age, higher diastolic blood pressure, and problems with the co-ordination and function of the upper left chamber of the heart called the left atrium.Based on these factors, we created a new scoring system that can identify patients who are at higher risk of developing AF better than the current scoring systems, the PADS score. This can potentially help doctors provide more targeted and effective treatment to these patients, ultimately aiming to reduce their risk of having another stroke

    State of the art in selection of variables and functional forms in multivariable analysis-outstanding issues

    Get PDF
    Background: How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc ‘traditional’ approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years ago. More recently, many alternative approaches to address these two challenges have been proposed, but knowledge of their properties and meaningful comparisons between them are scarce. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, many outstanding issues in multivariable modelling remain. Our main aims are to identify and illustrate such gaps in the literature and present them at a moderate technical level to the wide community of practitioners, researchers and students of statistics. Methods: We briefly discuss general issues in building descriptive regression models, strategies for variable selection, different ways of choosing functional forms for continuous variables and methods for combining the selection of variables and functions. We discuss two examples, taken from the medical literature, to illustrate problems in the practice of modelling. Results: Our overview revealed that there is not yet enough evidence on which to base recommendations for the selection of variables and functional forms in multivariable analysis. Such evidence may come from comparisons between alternative methods. In particular, we highlight seven important topics that require further investigation and make suggestions for the direction of further research. Conclusions: Selection of variables and of functional forms are important topics in multivariable analysis. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, further comparative research is required

    Relationship of cell proliferation (Ki-67) to (99m)Tc-(V)DMSA uptake in breast cancer

    Get PDF
    INTRODUCTION: The aim of the present study was to identify the relationships between the uptake of radiotracers – namely pentavalent dimercaptosuccinic acid [(V)DMSA] and sestamibi (MIBI) – and the following parameters in primary breast cancer: steroid receptor concentrations (i.e. estrogen receptor [ER] and progesterone receptor [PR]), Ki-67 expression, tumor size, tumor grade, age, and levels of expression of p53 and c-erbB-2. In addition, by multivariate regression analysis, we further isolated those factors with independent associations with (V)DMSA and/or MIBI uptake in primary breast cancer. METHODS: Thirty-four patients with histologically confirmed breast carcinoma underwent preoperative scintimammography with technetium-99m ((99m)Tc)-(V)DMSA and/or (99m)Tc-MIBI in consecutive sessions 10 and 60 min after administration of 925–1110 MBq of each radiotracer. The tumor-to-background ratio was calculated and correlated with the presence of ER, PR, Ki-67, tumor size, tumor grade, p53, and c-erbB-2. ER, PR, p53, and c-erbB-2 were determined immunohistochemically. The analysis included tumor-to-background ratio of (V)DMSA and MIBI uptake as dependent and all of the other parameters as independent variables. RESULTS: Correlation was positive between Ki-67 and (V)DMSA (r = 0.37 at 10 min, P = 0.038; r = 0.42 at 60 min, P = 0.018) and inverse between PR and (V)DMSA uptake (r = -0.46 at 10 min, P = 0.010; r = -0.51 at 60 min, P = 0.003). Multivariate regression analysis demonstrated a positive correlation between Ki-67 and (V)DMSA at 60 min (P = 0.045). Ki-67 was not significantly correlated with MIBI uptake, whereas tumor size was positively correlated with MIBI uptake at 60 min both in univariate (r = 0.45, P = 0.027) and multivariate analysis (P = 0.024). Negative correlations were observed between (V)DMSA uptake and ER, as well as between ER/PR and MIBI uptake, but these were not significant. CONCLUSION: Ki-67 appears to represent the major independent factor affecting (V)DMSA uptake in breast cancer. Tumor size was the only independent parameter influencing MIBI uptake in breast cancer. (V)DMSA appears to have an advantage over MIBI in that it can be used to visualize tumors with intense proliferative activity, and thus it can identify those tumors that are more aggressive

    Atrial fibrillation in embolic stroke of undetermined source: Role of advanced imaging of left atrial function

    Get PDF
    Background: Atrial fibrillation (AF) is detected in over 30% of patients following an embolic stroke of undetermined source (ESUS) when monitored with an implantable loop recorder (ILR). Identifying AF in ESUS survivors has significant therapeutic implications and AF risk is essential to guide screening with long-term monitoring. The present study aimed to establish the role of Left Atrial (LA) function in subsequent AF identification and develop a risk model for AF in ESUS. Methods: We conducted a single-centre retrospective case-control study including all patients with ESUS referred to our institution for ILR implantation from December 2009 to September 2019. We recorded clinical variables at baseline and analyzed transthoracic echocardiograms in sinus rhythm. Univariate and multivariable analyses were performed to inform variables associated with AF. Lasso regression analysis was used to develop a risk prediction model for AF. The risk model was internally validated using bootstrapping. Results: Three hundred and twenty-three patients with ESUS underwent ILR implantation. In the ESUS population, 293 had a stroke, whereas 30 had suffered a TIA as adjudicated by a senior stroke physician. AF of any duration was detected in 47.1%. Mean follow-up was 710 days. Following lasso regression with backward elimination, we combined increasing lateral PA (the time interval from the beginning of p wave on surface electrocardiogram to the beginning of A’ wave on pulsed wave tissue Doppler of the lateral mitral annulus) (OR 1.011), increasing Age (OR 1.035), higher diastolic blood pressure (DBP) (OR 1.027) and abnormal LA reservoir Strain (OR 0.973) into a new PADS score. The probability of identifying AF can be estimated using the formula: Model discrimination was good (AUC 0.72). The PADS score was internally validated using bootstrapping with 1000 samples of 150 patients showing consistent results with an AUC of 0.73. Conclusions: The novel PADS score can identify the risk of AF on prolonged monitoring with ILR following ESUS and should be considered a dedicated risk-stratification tool for decision-making regarding the screening strategy for AF in stroke

    Ensemble of a subset of kNN classifiers

    Get PDF
    Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines

    A feature selection method for classification within functional genomics experiments based on the proportional overlapping score

    Get PDF
    Background: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task.Results: We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.Conclusions: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes
    corecore