20 research outputs found
Does the Missing Data Imputation Method Affect the Composition and Performance of Prognostic Models?
Background: We already showed the superiority of imputation of missing data (via Multivariable Imputation via
Chained Equations (MICE) method) over exclusion of them; however, the methodology of MICE is complicated.
Furthermore, easier imputation methods are available. The aim of this study was to compare them in terms of
model composition and performance.
Methods: Three hundreds and ten breast cancer patients were recruited. Four approaches were applied to
impute missing data. First we adopted an ad hoc method in which missing data for each variable was replaced
by the median of observed values. Then 3 likelihood-based approaches were used. In the regression imputation,
a regression model compared the variable with missing data to the rest of the variables. The regression equation
was used to fill the missing data. The Expectation Maximum (E-M) algorithm was implemented in which missing
data and regression parameters were estimated iteratively until convergence of regression parameters. Finally,
the MICE method was applied. Models developed were compared in terms of variables significantly contributed
to the multifactorial analysis, sensitivity and specificity.
Results: All candidate variables significantly contributed to the MICE model. However, grade of disease lost its
effect in other three models. The MICE model showed the best performance followed by E-M model.
Conclusion: Among imputation methods, final models were not the same, in terms of composition and performance.
Therefore, modern imputation methods are recommended to recover the information
Impact of Imputation of Missing Data on Estimation of Survival Rates: An Example in Breast Cancer
Background: Multifactorial regression models are frequently used in medicine to
estimate survival rate of patients across risk groups. However, their results are not
generalisable, if in the development of models assumptions required are not
satisfied. Missing data is a common problem in pathology. The aim of this paper
is to address the danger of exclusion of cases with missing data, and to highlight
the importance of imputation of missing data before development of multifactorial
models.
Methods: This study was performed on 310 breast cancer patients diagnosed in
Shiraz (Southern Iran). Performing a complete-case Cox regression model, a
prognostic index was calculated so as to categorise the patients into 3 risk groups.
Then, applying the Multivariate Imputation via Chained Equations (MICE) method,
missing data were imputed 10 times. Using imputed data sets, modelling was
performed to assign patients into risk groups. Estimated actuarial Overal Survival
(OS) rates corresponding to analysis of complete-case and imputed data sets
were compared.
Results: Cases with at least one missing datum experienced a significantly better
survival curve. Estimates derived analysing complete-case data, relative to
imputed data sets, underestimated the OS rate in all risk groups. In addition
confidence intervals were wider indicating loss in precision due to attrition in
sample size and power.
Conclusion: Results obtained highlighted the danger of exclusion of missing data.
Imputation of missing data avoids biased estimates, increases the precision of
estimates, and improves genralisability of results to other similar populations
Survival Models in Breast Cancer Patients
Background: Breast cancer is the most prevalent malignancy among Iranian women. Five and ten year survival
is one of the indicators used for evaluation of the quality of care after surgery. In this study, we used several
survival models to determine risk factors, survival times and life expectancies of different types of surgery.
Methods: This study was performed on 310 patients who underwent surgery during a ten years period. Logistic
regression and Cox regression models were used to analyze the factors leading to death. The Kaplan-Meier
method (non-parametric) was used to estimate the survival rate. The log-rank test was used to compare survival
in different groups. To compare life expectancy of different types of surgery, we used the actuarial life table
method.
Results: Logistic regression showed that stage, grade, age and history of benign malignancy had significant
relationship with death. Log-rank test showed that there was a significant difference between survival for patients
with different stages, age and history of benign tumors. Cox regression model demonstrated that the variables of
stage, grade, age and benign problems were the major risk factors. Actuarial life table model showed that the life
expectancy for all patients was 10.03 years. This life expectancy in early stages of breast cancer for mastectomy
and lumpectomy were 8.99 and 8.35 years, respectively, which was not significant.
Conclusion: It can be concluded that the higher stage, grade, age and history of benign tumor were, the most
important risk factors were correlated to mortality in breast cancer patients. This study showed that there was no
significant difference between life expectancies of mastectomy and lumpectomy surgery
Tamoxifen resistance in early breast cancer: statistical modelling of tissue markers to improve risk prediction
BACKGROUND: For over two decades, the Nottingham Prognostic Index (NPI) has been used in the United Kingdom to calculate risk
scores and inform management about breast cancer patients. It is derived using just three clinical variables – nodal involvement,
tumour size and grade. New scientific methods now make cost-effective measurement of many biological characteristics of tumour
tissue from breast cancer biopsy samples possible. However, the number of potential explanatory variables to be considered presents
a statistical challenge. The aim of this study was to investigate whether in ERþ tamoxifen-treated breast cancer patients, biological
variables can add value to NPI predictors, to provide improved prognostic stratification in terms of overall recurrence-free survival
(RFS) and also in terms of remaining recurrence free while on tamoxifen treatment (RFoT). A particular goal was to enable the
discrimination of patients with a very low risk of recurrence.
METHODS: Tissue samples of 401 cases were analysed by microarray technology, providing biomarker data for 72 variables in total,
from AKT, BAD, HER, MTOR, PgR, MAPK and RAS families. Only biomarkers screened as potentially informative (i.e., exhibiting
univariate association with recurrence) were offered to the multivariate model. The multiple imputation method was used to deal
with missing values, and bootstrap sampling was used to assess internal validity and refine the model.
RESULTS: Neither the RFS nor RFoT models derived included Grade, but both had better predictive and discrimination ability than NPI.
A slight difference was observed between models in terms of biomarkers included, and, in particular, the RFoT model alone included
HER2. The estimated 7-year RFS rates in the lowest-risk groups by RFS and RFoT models were 95 and 97%, respectively, whereas
the corresponding rate for the lowest-risk group of NPI was 89%.
CONCLUSION: The findings demonstrate considerable potential for improved prognostic modelling by incorporation of biological
variables into risk prediction. In particular, the ability to identify a low-risk group with minimal risk of recurrence is likely to have clinical
appeal. With larger data sets and longer follow-up, this modelling approach has the potential to enhance an understanding of the
interplay of biological characteristics, treatment and cancer recurrence.
British Journal of Cancer (2010) 102
Assessment of Internal Validity of Prognostic Models through Bootstrapping and Multiple Imputation of Missing Data
Background:Prognostic models have clinical appeal to aid therapeutic decision making.Two main practical challenges in development of such models are assessment of validity of models and imputation of missing data.In this study,importance of imputation of missing data and application of bootstrap technique in development, simplification, and assessment of internal validity of a prognostic model is highlighted.Methods: Overall, 310 breast cancer patients were recruited. Missing data were imputed 10 times. Then to deal with sensitivity of the model due to small changes in the data (internal validity), 100 bootstrap samples were drawn from each of 10 imputed data sets leading to 1000 samples. A Cox regression model was fitted to each of 1000 samples. Only variables retained in more than 50% of samples were used in development of final model. Results: Four variables retained significant in more than 50% (i.e. 500 samples) of bootstrap samples; tumour size (91%), tumour grade (64%), history of benign breast disease (77%), and age at diagnosis (59%). Tumour size was the strongest predictor with inclusion frequency exceeding 90%. Number of deliveries was correlated with age at diagnosis (r=0.35, P<0.001).These two variables together retained significant in more than 90% of samples.Conclusion:We addressed two important methodological issues using a cohort of breast cancer patients. The algorithm combines multiple imputation of missing data and bootstrapping and has the potential to be applied in all kind of regression modelling exercises so as to address internal validity of models.  
Prevention of Disease Complications Through Diagnostic Models: How to Tackle the Problem of Missing Data?
Background: Diagnostic models are frequently used to assess the role of risk factors on disease complications, and therefore to avoid them. Missing data is an issue that challenges the model making. The aim of this study was to develop a diagnostic model to predict death in HIV/ AIDS patients when missing data exist.Methods: HIV patients (n=1460) referred to Voluntary Consoling and Testing Center (VCT) of Shiraz southern Iran during 2004-2009 were recruited. Univariate association between variables and death was assessed. Only variables which had univariate P< 0.25 were selected to be offered to the Multifactorial models. First, patients with missing data on candidate variables were deleted (C-C model). Then, applying Multivariable Imputation via Chained Equations (MICE), missing data were imputed. Logistic regression was fitted to C-C and imputed data sets (MICE model). Models were compared in terms of number of variables retained in the final model, width of confidence intervals, and discrimination ability.Result: About 22% of data were lost in C-C model. Number of variables retained in the C-C and MICE models was 2 and 6 respectively. Confidence Intervals (C.I.) corresponding to C-C model was wider than that of MICE. The MICE model showed greater discrimination ability than C-C model (70% versus 64%).Conclusion: The -C analysis resulted to loss of power and wide CI's. Once missing data were imputed, more variables reached significance level and C.I.'s were narrower. Therefore, we do recommend the application of the imputation method for handling missing data
On the use of fractional polynomial models to assess preventive aspect of variables: An example in prevention of mortality following HIV infection
Background: Identification of disease risk factors can help in the prevention of diseases. In assessing the predictive value of continuous variables, a routine procedure is to categorize the factors. This yield to inability to detect nonlinear relationship, if exist. Multivariate fractional polynomial (MFP) modeling is a flexible method to reveal nonlinear associations. We aim to demonstrate the impact of choice of risk function on the significance of variables. Methods: We selected 6508 HIVinfected persons registered in the Australia National HIV Registry between 1980 and 2003 to assess the predictors associated with the risk of death after HIV infection prior to AIDS. First, CD4 count as a categorical factor with three other categorical variables (age, sex, and HIV exposure category) was entered into the Cox regression model. Second, CD4 counts as a continuous variable along with other categorical variables were entered into the fractional polynomial (FP) model. Results: Both the Cox and FP models showed age ≥ 40 years and hemophiliac patients were significantly associated with increased risk of death. In the categorized model, the CD4 variable did not reach the significance level. However, this variable was highly significant in the MFP model. The FP model showed slightly better performance in terms of discrimination ability and goodness of fit. Conclusions: The FP model is a flexible method in detecting the predictive effect of continuous variables. This method enhances the ability to assess the predictive ability of variables and improves model performance
Comparison of conventional risk factors in middle-aged versus elderly diabetic and nondiabetic patients with myocardial infarction: prediction with decision-analytic model
BACKGROUND:
We sought to predict occurrence of myocardial infarction (MI) by means of a classification and regression tree (CART) model by conventional risk factors in middle-aged versus elderly (age ⩾65years) diabetic and nondiabetic patients from the Modares Heart Study.
METHOD:
A total of 469 patients were randomly selected and categorized into two groups according to clinical diabetes status. Group I consisted of 238 diabetic patients and group II consisted of 231 nondiabetic patients. Our population was MI positive. The outcome investigated was diabetes mellitus. We used a decision-analytic model to predict the diagnosis of patients with suspected MI.
RESULTS:
We constructed 4 predictive patterns using 12 input variables and 1 output variable in terms of their sensitivity, specificity and risk. The differences among patterns were due to inclusion of predictor variables. The CART model suggested different variables of hypertension, mean cell volume, fasting blood sugar, cholesterol, triglyceride and uric acid concentration based on middle-aged and elderly patients at high risk for MI. Levels of biochemical measurements identified as best risk cutoff points. In evaluating the precision of different patterns, sensitivity and specificity were 47.9-84.0% and 56.3-93.0%, respectively.
CONCLUSIONS:
The CART model is capable of symbolizing interpretable clinical data for confirming and better prediction of MI occurrence in clinic or in hospital. Therefore, predictor variables in pattern could affect the outcome based on age group variable. Hyperglycemia, hypertension, hyperlipidemia and hyperuricemia were serious predictors for occurrence of MI in diabetics
Can we Replace Arterial Blood Gas Analysis by Pulse Oximetry in Neonates with Respiratory Distress Syndrome, who are Treated According to INSURE Protocol?
Neonates with respiratory distress syndrome (RDS), who are treated according to INSURE protocol; require arterial blood gas (ABG) analysis to decide on appropriate management. We conducted this study to investigate the validity of pulse oximetry instead of frequent ABG analysis in the evaluation of these patients. From a total of 193 blood samples obtained from 30 neonates <1500 grams with RDS, 7.2% were found to have one or more of the followings: acidosis, hypercapnia, or hypoxemia. We found that pulse oximetry in the detection of hyperoxemia had a good validity to appropriately manage patients without blood gas analysis. However, the validity of pulse oximetry was not good enough to detect acidosis, hypercapnia, and hypoxemia