3,571 research outputs found
Multiple Imputation Ensembles (MIE) for dealing with missing data
Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases
Resource use data by patient report or hospital records: Do they agree?
Background: Economic evaluations alongside clinical trials are becoming increasingly common.
Cost data are often collected through the use of postal questionnaires; however, the accuracy of
this method is uncertain. We compared postal questionnaires with hospital records for collecting
data on physiotherapy service use.
Methods: As part of a randomised trial of orthopaedic medicine compared with orthopaedic
surgery we collected physiotherapy use data on a group of patients from retrospective postal
questionnaires and from hospital records.
Results: 315 patients were referred for physiotherapy. Hospital data on attendances was available
for 30% (n = 96), compared with 48% (n = 150) of patients completing questionnaire data (95% Cl
for difference = 10% to 24%); 19% (n = 59) had data available from both sources. The two methods
produced an intraclass correlation coefficient of 0.54 (95% Cl 0.31 to 0.70). However, the two
methods produced significantly different estimates of resource use with patient self report recalling
a mean of 1.3 extra visits (95% Cl 0.4 to 2.2) compared with hospital records.
Conclusions: Using questionnaires in this study produced data on a greater number of patients
compared with examination of hospital records. However, the two data sources did differ in the
quantity of physiotherapy used and this should be taken into account in any analysi
Transient peak-strain matching partially recovers the age-impaired mechanoadaptive cortical bone response
Mechanoadaptation maintains bone mass and architecture; its failure underlies age-related decline in bone strength. It is unclear whether this is due to failure of osteocytes to sense strain, osteoblasts to form bone or insufficient mechanical stimulus. Mechanoadaptation can be restored to aged bone by surgical neurectomy, suggesting that changes in loading history can rescue mechanoadaptation. We use non-biased, whole-bone tibial analyses, along with characterisation of surface strains and ensuing mechanoadaptive responses in mice at a range of ages, to explore whether sufficient load magnitude can activate mechanoadaptation in aged bone. We find that younger mice adapt when imposed strains are lower than in mature and aged bone. Intriguingly, imposition of short-term, high magnitude loading effectively primes cortical but not trabecular bone of aged mice to respond. This response was regionally-matched to highest strains measured by digital image correlation and to osteocytic mechanoactivation. These data indicate that aged bone’s loading response can be partially recovered, non-invasively by transient, focal high strain regions. Our results indicate that old murine bone does respond to load when the loading is of sufficient magnitude, and bones’ age-related adaptation failure may be due to insufficient mechanical stimulus to trigger mechanoadaptation
Integrated multiple mediation analysis: A robustness–specificity trade-off in causal structure
Recent methodological developments in causal mediation analysis have addressed several issues regarding multiple mediators. However, these developed methods differ in their definitions of causal parameters, assumptions for identification, and interpretations of causal effects, making it unclear which method ought to be selected when investigating a given causal effect. Thus, in this study, we construct an integrated framework, which unifies all existing methodologies, as a standard for mediation analysis with multiple mediators. To clarify the relationship between existing methods, we propose four strategies for effect decomposition: two-way, partially forward, partially backward, and complete decompositions. This study reveals how the direct and indirect effects of each strategy are explicitly and correctly interpreted as path-specific effects under different causal mediation structures. In the integrated framework, we further verify the utility of the interventional analogues of direct and indirect effects, especially when natural direct and indirect effects cannot be identified or when cross-world exchangeability is invalid. Consequently, this study yields a robustness–specificity trade-off in the choice of strategies. Inverse probability weighting is considered for estimation. The four strategies are further applied to a simulation study for performance evaluation and for analyzing the Risk Evaluation of Viral Load Elevation and Associated Liver Disease/Cancer data set from Taiwan to investigate the causal effect of hepatitis C virus infection on mortality
Investigating the missing data mechanism in quality of life outcomes: a comparison of approaches
Background: Missing data is classified as missing completely at random (MCAR), missing at
random (MAR) or missing not at random (MNAR). Knowing the mechanism is useful in identifying
the most appropriate analysis. The first aim was to compare different methods for identifying this
missing data mechanism to determine if they gave consistent conclusions. Secondly, to investigate
whether the reminder-response data can be utilised to help identify the missing data mechanism.
Methods: Five clinical trial datasets that employed a reminder system at follow-up were used.
Some quality of life questionnaires were initially missing, but later recovered through reminders.
Four methods of determining the missing data mechanism were applied. Two response data
scenarios were considered. Firstly, immediate data only; secondly, all observed responses
(including reminder-response).
Results: In three of five trials the hypothesis tests found evidence against the MCAR assumption.
Logistic regression suggested MAR, but was able to use the reminder-collected data to highlight
potential MNAR data in two trials.
Conclusion: The four methods were consistent in determining the missingness mechanism. One
hypothesis test was preferred as it is applicable with intermittent missingness. Some inconsistencies between the two data scenarios were found. Ignoring the reminder data could potentially give a distorted view of the missingness mechanism. Utilising reminder data allowed the possibility of MNAR to be considered.The Chief Scientist Office of the Scottish Government Health Directorate. 
Research Training Fellowship (CZF/1/31
Accuracy of magnetic resonance studies in the detection of chondral and labral lesions in femoroacetabular impingement : systematic review and meta-analysis
Background: Several types of Magnetic resonance imaging (MRI) are commonly used in imaging of femoroacetabular impingement (FAI), however till now there are no clear protocols and recommendations for each type. The aim of this meta-analysis is to detect the accuracy of conventional magnetic resonance imaging (cMRI), direct magnetic resonance arthrography (dMRA) and indirect magnetic resonance arthrography (iMRA) in the diagnosis of chondral and labral lesions in femoroacetabular impingement (FAI). 
Methods: A literature search was finalized on the 17th of May 2016 to collect all studies identifying the accuracy of cMRI, dMRA and iMRA in diagnosing chondral and labral lesions associated with FAI using surgical results (arthroscopic or open) as a reference test. Pooled sensitivity and specificity with 95% confidence intervals using a random-effects meta-analysis for MRI, dMRA and iMRA were calculated also area under receiver operating characteristic (ROC) curve (AUC) was retrieved whenever possible where AUC is equivocal to diagnostic accuracy. 
Results: The search yielded 192 publications which were reviewed according inclusion and exclusion criteria then 21 studies fulfilled the eligibility criteria for the qualitative analysis with a total number of 828 cases, lastly 12 studies were included in the quantitative meta-analysis. Meta-analysis showed that as regard labral lesions the pooled sensitivity, specificity and AUC for cMRI were 0.864, 0.833 and 0.88 and for dMRA were 0.91, 0.58 and 0.92. While in chondral lesions the pooled sensitivity, specificity and AUC for cMRI were 0.76, 0.72 and 0.75 and for dMRA were 0.75, 0.79 and 0.83, while for iMRA were sensitivity of 0.722 and specificity of 0.917. 
Conclusions: The present meta-analysis showed that the diagnostic test accuracy was superior for dMRA when compared with cMRI for detection of labral and chondral lesions. The diagnostic test accuracy was superior for labral lesions when compared with chondral lesions in both cMRI and dMRA. Promising results are obtained concerning iMRA but further studies still needed to fully assess its diagnostic accuracy
Risk factors for exacerbations and pneumonia in patients with chronic obstructive pulmonary disease: a pooled analysis.
BACKGROUND: Patients with chronic obstructive pulmonary disease (COPD) are at risk of exacerbations and pneumonia; how the risk factors interact is unclear. METHODS: This post-hoc, pooled analysis included studies of COPD patients treated with inhaled corticosteroid (ICS)/long-acting β2 agonist (LABA) combinations and comparator arms of ICS, LABA, and/or placebo. Backward elimination via Cox's proportional hazards regression modelling evaluated which combination of risk factors best predicts time to first (a) pneumonia, and (b) moderate/severe COPD exacerbation. RESULTS: Five studies contributed: NCT01009463, NCT01017952, NCT00144911, NCT00115492, and NCT00268216. Low body mass index (BMI), exacerbation history, worsening lung function (Global Initiative for Chronic Obstructive Lung Disease [GOLD] stage), and ICS treatment were identified as factors increasing pneumonia risk. BMI was the only pneumonia risk factor influenced by ICS treatment, with ICS further increasing risk for those with BMI <25 kg/m2. The modelled probability of pneumonia varied between 3 and 12% during the first year. Higher exacerbation risk was associated with a history of exacerbations, poorer lung function (GOLD stage), female sex and absence of ICS treatment. The influence of the other exacerbation risk factors was not modified by ICS treatment. Modelled probabilities of an exacerbation varied between 31 and 82% during the first year. CONCLUSIONS: The probability of an exacerbation was considerably higher than for pneumonia. ICS reduced exacerbations but did not influence the effect of risks associated with prior exacerbation history, GOLD stage, or female sex. The only identified risk factor for ICS-induced pneumonia was BMI <25 kg/m2. Analyses of this type may help the development of COPD risk equations
Fuzzy min-max neural networks for categorical data: application to missing data imputation
The fuzzy min–max neural network classifier is a supervised learning method. This classifier takes the hybrid neural networks and fuzzy systems approach. All input variables in the network are required to correspond to continuously valued variables, and this can be a significant constraint in many real-world situations where there are not only quantitative but also categorical data. The usual way of dealing with this type of variables is to replace the categorical by numerical values and treat them as if they were continuously valued. But this method, implicitly defines a possibly unsuitable metric for the categories. A number of different procedures have been proposed to tackle the problem. In this article, we present a new method. The procedure extends the fuzzy min–max neural network input to categorical variables by introducing new fuzzy sets, a new operation, and a new architecture. This provides for greater flexibility and wider application. The proposed method is then applied to missing data imputation in voting intention polls. The micro data—the set of the respondents’ individual answers to the questions—of this type of poll are especially suited for evaluating the method since they include a large number of numerical and categorical attributes
Construct-level predictive validity of educational attainment and intellectual aptitude tests in medical student selection: meta-regression of six UK longitudinal studies
Background: Measures used for medical student selection should predict future performance during training. A problem for any selection study is that predictor-outcome correlations are known only in those who have been selected, whereas selectors need to know how measures would predict in the entire pool of applicants. That problem of interpretation can be solved by calculating construct-level predictive validity, an estimate of true predictor-outcome correlation across the range of applicant abilities.
Methods: Construct-level predictive validities were calculated in six cohort studies of medical student selection and training (student entry, 1972 to 2009) for a range of predictors, including A-levels, General Certificates of Secondary Education (GCSEs)/O-levels, and aptitude tests (AH5 and UK Clinical Aptitude Test (UKCAT)). Outcomes included undergraduate basic medical science and finals assessments, as well as postgraduate measures of Membership of the Royal Colleges of Physicians of the United Kingdom (MRCP(UK)) performance and entry in the Specialist Register. Construct-level predictive validity was calculated with the method of Hunter, Schmidt and Le (2006), adapted to correct for right-censorship of examination results due to grade inflation.
Results: Meta-regression analyzed 57 separate predictor-outcome correlations (POCs) and construct-level predictive validities (CLPVs). Mean CLPVs are substantially higher (.450) than mean POCs (.171). Mean CLPVs for first-year examinations, were high for A-levels (.809; CI: .501 to .935), and lower for GCSEs/O-levels (.332; CI: .024 to .583) and UKCAT (mean = .245; CI: .207 to .276). A-levels had higher CLPVs for all undergraduate and postgraduate assessments than did GCSEs/O-levels and intellectual aptitude tests. CLPVs of educational attainment measures decline somewhat during training, but continue to predict postgraduate performance. Intellectual aptitude tests have lower CLPVs than A-levels or GCSEs/O-levels.
Conclusions: Educational attainment has strong CLPVs for undergraduate and postgraduate performance, accounting for perhaps 65% of true variance in first year performance. Such CLPVs justify the use of educational attainment measure in selection, but also raise a key theoretical question concerning the remaining 35% of variance (and measurement error, range restriction and right-censorship have been taken into account). Just as in astrophysics, ‘dark matter’ and ‘dark energy’ are posited to balance various theoretical equations, so medical student selection must also have its ‘dark variance’, whose nature is not yet properly characterized, but explains a third of the variation in performance during training. Some variance probably relates to factors which are unpredictable at selection, such as illness or other life events, but some is probably also associated with factors such as personality, motivation or study skills
Childbearing intentions in a low fertility context: the case of Romania
This paper applies the Theory of Planned Behaviour (TPB) to find out the predictors of fertility intentions in Romania, a low-fertility country. We analyse how attitudes, subjective norms and perceived behavioural control relate to the intention to have a child among childless individuals and one-child parents. Principal axis factor analysis confirms which items proposed by the Generation and Gender Survey (GGS 2005) act as valid and reliable measures of the suggested theoretical socio-psychological factors. Four parity-specific logistic regression models are applied to evaluate the relationship between the socio-psychological factors and childbearing intentions. Social pressure emerges as the most important aspect in fertility decision-making among childless individuals and one-child parents, and positive attitudes towards childbearing are a strong component in planning for a child. This paper also underlines the importance of the region-specific factors when studying childbearing intentions: planning for the second child significantly differs among the development regions, representing the cultural and socio-economic divisions of the Romanian territory
- …
