30 research outputs found
Robust Generalised Linear Regression Models in Genetic Studies: Assessment of Standard Techniques and Their Generalisation to Incorporate Hampel's Function
In genetic studies, data under investigation exhibit a high-dimensionality, i.e., there are many more independent variables than measured individuals. In high-dimensional data, one expects observations departing from the majority of the data (so-called outliers). Such outliers can seriously affect statistical results because applied approaches using maximum likelihood estimation can be strongly biased by outliers. Robust approaches account for such outliers by assigning a weight to each observation, thus controlling their impact. However, these approaches are only rarely used in genetic studies.
In this thesis, benefits and limitations of robust (generalised) linear models in comparison to the standard maximum likelihood approaches were investigated. For this purpose, an existing robust generalised linear model framework was generalised to incorporate another weighting function.
In a first set of analyses, several already existing standard and robust approaches for linear, Poisson as well as logistic regression were compared. There, the attention was drawn to model selection consistency and prediction accuracy, the influence of a single outlier and the influence of genotyping errors on estimates. The prediction accuracy was similar for (robust) linear and (robust) Poisson regression models in a real data application. In view of model selection consistency, Poisson regression selected two or three independent variables whereas linear regression always included the same single independent variable, which was, however, in common for all regression methods. These results were complemented by an inclusion of different independent variables into the standard and robust logistic regression models in a second real data application. Within this application, it was observed that robust logistic regression better controlled the outlier influence. A simulation study revealed a decreasing influence of genotyping errors on estimates with increasing causal allele carrier frequencies. Furthermore, there was an indication of a possible benefit of robust logistic regression.
At the time of method application, the robust generalised linear model framework only provided the bounded Huber function for observation weighting. In this thesis, the re-descending Hampel function was incorporated into this framework for logistic and Poisson regression by explicit calculations for the Fisher consistency correction and for the asymptotic variance as well as by adaptation of the existing source code. In a second set of analyses, the developed approach for robust logistic regression was compared against the standard and the existing robust logistic regression methods based on simulated and real data -- both dealing with an (indirect) association analysis. In the simulation study, several populations were simulated assuming different penetrance models, minor allele frequencies, genotyping error rates and linkage between causal and marker allele locus. In the analysis, the attention was drawn to several statistical properties comprising mean squared error of the estimates, statistical power and type I error rate. In the simulation study, all approaches controlled the type I error rate. Based on the results of the statistical properties investigation, a method recommendation must depend on the aim of the analysis. To reach a large power for variant identification, standard logistic regression would be an adequate choice. If a small mean squared error probably avoiding a strong effect overestimation was the goal, robust logistic regression represented a valuable alternative to the standard approach. This especially held when analysing rare variants or assuming a recessive penetrance model both leading to a low probability to observe the causal genotype. If extreme outliers are expected in the data, the re-descending Hampel function should be favoured.
The aim for future work should be the examination of statistical properties (mean squared error, statistical power, type I error rate) of robust Poisson regression and of the robust hurdle model arising by the combination of the logistic and the truncated Poisson model – both applying the Hampel function. Additionally, an inclusion of further weighting functions as well as additional distributions would be of great interest for a broader application range and the chance of a power gain for robust regression methods.
Summarising, the coincidence of expected outliers and observed rare events in high-dimensional data challenges the analysis of genetic data. The results of this thesis indicate that these analyses can benefit from the application of robust logistic regression models to narrow down the winner's curse of rare and recessive susceptibility variants
Internet-based cognitive-behavioural writing therapy for reducing post-traumatic stress after severe sepsis in patients and their spouses (REPAIR): results of a randomised-controlled trial
Objectives
To investigate the efficacy, safety and applicability of internet-based, therapist-led partner-assisted cognitive-behavioural writing therapy (iCBT) for post-traumatic stress disorder (PTSD) symptoms after intensive care for sepsis in patients and their spouses compared with a waitlist (WL) control group.
Design
Randomised-controlled, parallel group, open-label, superiority trial with concealed allocation.
Setting
Internet-based intervention in Germany; location-independent via web-portal.
Participants
Patients after intensive care for sepsis and their spouses of whom at least one had a presumptive PTSD diagnosis (PTSD-Checklist (PCL-5)≥33). Initially planned sample size: 98 dyads.
Interventions
ICBT group: 10 writing assignments over a 5-week period; WL control group: 5-week waiting period.
Primary and secondary outcome measures
Primary outcome: pre–post change in PTSD symptom severity (PCL-5). Secondary outcomes: remission of PTSD, depression, anxiety and somatisation, relationship satisfaction, health-related quality of life, premature termination of treatment. Outcomes measures were applied pre and post treatment and at 3, 6 and 12 months follow-up.
Results
Twenty-five dyads representing 34 participants with a presumptive PTSD diagnosis were randomised and analysed (ITT principle). There was no evidence for a difference in PCL-5 pre–post change for iCBT compared with WL (mean difference −0.96, 95% CI (−5.88 to 3.97), p=0.703). No adverse events were reported. Participants confirmed the applicability of iCBT.
Conclusions
ICBT was applied to reduce PTSD symptoms after intensive care for sepsis, for the first time addressing both patients and their spouses. It was applicable and safe in the given population. There was no evidence for the efficacy of iCBT on PTSD symptom severity. Due to the small sample size our findings remain preliminary but can guide further research, which is needed to determine if modified approaches to post-intensive care PTSD may be more effective
Practical investigation of the performance of robust logistic regression to predict the genetic risk of hypertension
Logistic regression is usually applied to investigate the association between inherited genetic variants and a binary disease phenotype. A limitation of standard methods used to estimate the parameters of logistic regression models is their strong dependence on a few observations deviating from the majority of the data. We used data from the Genetic Analysis Workshop 18 to explore the possible benefit of robust logistic regression to estimate the genetic risk of hypertension. The comparison between standard and robust methods relied on the influence of departing hypertension profiles (outliers) on the estimated odds ratios, areas under the receiver operating characteristic curves, and clinical net benefit. Our results confirmed that single outliers may substantially affect the estimated genotype relative risks. The ranking of variants by probability values was different in standard and in robust logistic regression. For cutoff probabilities between 0.2 and 0.6, the clinical net benefit estimated by leave-one-out cross-validation in the investigated sample was slightly larger under robust regression, but the overall area under the receiver operating characteristic curve was larger for standard logistic regression. The potential advantage of robust statistics in the context of genetic association studies should be investigated in future analyses based on real and simulated data
Effect size estimates from umbrella designs: Handling patients with a positive test result for multiple biomarkers using random or pragmatic subtrial allocation.
Umbrella trials have been suggested to increase trial conduct efficiency when investigating different biomarker-driven experimental therapies. An overarching platform is used for patient screening and subsequent subtrial allocation according to patients' biomarker status. Two subtrial allocation schemes for patients with a positive test result for multiple biomarkers are (i) the pragmatic allocation to the eligible subtrial with the currently fewest included patients and (ii) the random allocation to one of the eligible subtrials. Obviously, the subtrials compete for such patients which are consequently underrepresented in the subtrials. To address questions of the impact of an umbrella design in general as well as with respect to subtrial allocation and analysis method, we investigate an umbrella trial with two parallel group subtrials and discuss generalisations. First, we analytically quantify the impact of the umbrella design with random allocation on the number of patients needed to be screened, the biomarker status distribution and treatment effect estimates compared to the corresponding gold standard of an independent parallel group design. Using simulations and real data, we subsequently compare both allocation schemes and investigate weighted linear regression modelling as possible analysis method for the umbrella design. Our results show that umbrella designs are more efficient than the gold standard. However, depending on the biomarker status distribution in the disease population, an umbrella design can introduce differences in estimated treatment effects in the presence of an interaction between treatment and biomarker status. In principle, weighted linear regression together with the random allocation scheme can address this difference though it is difficult to assess if such an approach is applicable in practice. In any case, caution is required when using treatment effect estimates derived from umbrella designs for e.g. future trial planning or meta-analyses
Robust logistic regression to narrow down the winner's curse for rare and recessive susceptibility variants [Source Code]
Logistic regression is the most common technique used for genetic case-control association studies. A disadvantage of standard maximum likelihood estimators of the genotype relative risk (GRR) is their strong dependence on outlier subjects, for example, patients diagnosed at unusually young age. Robust methods are available to constrain outlier influence, but they are scarcely used in genetic studies. This article provides a non-intimidating introduction to robust logistic regression, and investigates its benefits and limitations in genetic association studies. We applied the bounded Huber and extended the R package ‘robustbase’ with the re-descending Hampel functions to down-weight outlier influence. Computer simulations were carried out to assess the type I error rate, mean squared error (MSE) and statistical power according to major characteristics of the genetic study and investigated markers. Simulations were complemented with the analysis of real data. Both standard and robust estimation controlled type I error rates. Standard logistic regression showed the highest power but standard GRR estimates also showed the largest bias and MSE, in particular for associated rare and recessive variants. For illustration, a recessive variant with a true GRR=6.32 and a minor allele frequency=0.05 investigated in a 1000 case/1000 control study by standard logistic regression resulted in power=0.60 and MSE=16.5. The corresponding figures for Huber-based estimation were power=0.51 and MSE=0.53. Overall, Hampel- and Huber-based GRR estimates did not differ much. Robust logistic regression may represent a valuable alternative to standard maximum likelihood estimation when the focus lies on risk prediction rather than identification of susceptibility variants
Even faster and even more accurate first-passage time densities and distributions for the Wiener diffusion model
The Wiener diffusion model with two absorbing barriers is often used to describe response times and error probabilities in two-choice decisions. Different representations exist for the density and cumulative distribution of first-passage times, all including infinite series, but with different convergence for small and large times. We present a method that controls the approximation error of the small-time representation that occurs due to finite truncation of these series. Our approach improves and simplifies related work by Navarro and Fuss (2009) and Blurton et al. (2012, both in the Journal of Mathematical Psychology)