86 research outputs found

    A comparison study on modeling of clustered and overdispersed count data for multiple comparisons

    Get PDF
    Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered–e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion

    Detection of suspicious interactions of spiking covariates in methylation data

    Get PDF
    BACKGROUND: In methylation analyses like epigenome-wide association studies, a high amount of biomarkers is tested for an association between the measured continuous outcome and different covariates. In the case of a continuous covariate like smoking pack years (SPY), a measure of lifetime exposure to tobacco toxins, a spike at zero can occur. Hence, all non-smokers are generating a peak at zero, while the smoking patients are distributed over the other SPY values. Additionally, the spike might also occur on the right side of the covariate distribution, if a category "heavy smoker" is designed. Here, we will focus on methylation data with a spike at the left or the right of the distribution of a continuous covariate. After the methylation data is generated, analysis is usually performed by preprocessing, quality control, and determination of differentially methylated sites, often performed in pipeline fashion. Hence, the data is processed in a string of methods, which are available in one software package. The pipelines can distinguish between categorical covariates, i.e. for group comparisons or continuous covariates, i.e. for linear regression. The differential methylation analysis is often done internally by a linear regression without checking its inherent assumptions. A spike in the continuous covariate is ignored and can cause biased results. RESULTS: We have reanalysed five data sets, four freely available from ArrayExpress, including methylation data and smoking habits reported by smoking pack years. Therefore, we generated an algorithm to check for the occurrences of suspicious interactions between the values associated with the spike position and the non-spike positions of the covariate. Our algorithm helps to decide if a suspicious interaction can be found and further investigations should be carried out. This is mostly important, because the information on the differentially methylated sites will be used for post-hoc analyses like pathway analyses. CONCLUSIONS: We help to check for the validation of the linear regression assumptions in a methylation analysis pipeline. These assumptions should also be considered for machine learning approaches. In addition, we are able to detect outliers in the continuous covariate. Therefore, more statistical robust results should be produced in methylation analysis using our algorithm as a preprocessing step

    Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics

    Get PDF
    The Random Forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research

    Risk estimation and risk prediction using machine-learning methods

    Get PDF
    After an association between genetic variants and a phenotype has been established, further study goals comprise the classification of patients according to disease risk or the estimation of disease probability. To accomplish this, different statistical methods are required, and specifically machine-learning approaches may offer advantages over classical techniques. In this paper, we describe methods for the construction and evaluation of classification and probability estimation rules. We review the use of machine-learning approaches in this context and explain some of the machine-learning algorithms in detail. Finally, we illustrate the methodology through application to a genome-wide association analysis on rheumatoid arthritis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00439-012-1194-y) contains supplementary material, which is available to authorized users

    Estimands in epigenome-wide association studies

    Get PDF
    Background: In DNA methylation analyses like epigenome-wide association studies, effects in differentially methylated CpG sites are assessed. Two kinds of outcomes can be used for statistical analysis: Beta-values and M-values. M-values follow a normal distribution and help to detect differentially methylated CpG sites. As biological effect measures, differences of M-values are more or less meaningless. Beta-values are of more interest since they can be interpreted directly as differences in percentage of DNA methylation at a given CpG site, but they have poor statistical properties. Different frameworks are proposed for reporting estimands in DNA methylation analysis, relying on Beta-values, M-values, or both. Results: We present and discuss four possible approaches of achieving estimands in DNA methylation analysis. In addition, we present the usage of M-values or Beta-values in the context of bioinformatical pipelines, which often demand a predefined outcome. We show the dependencies between the differences in M-values to differences in Beta-values in two data simulations: a analysis with and without confounder effect. Without present confounder effects, M-values can be used for the statistical analysis and Beta-values statistics for the reporting. If confounder effects exist, we demonstrate the deviations and correct the effects by the intercept method. Finally, we demonstrate the theoretical problem on two large human genome-wide DNA methylation datasets to verify the results. Conclusions: The usage of M-values in the analysis of DNA methylation data will produce effect estimates, which cannot be biologically interpreted. The parallel usage of Beta-value statistics ignores possible confounder effects and can therefore not be recommended. Hence, if the differences in Beta-values are the focus of the study, the intercept method is recommendable. Hyper- or hypomethylated CpG sites must then be carefully evaluated. If an exploratory analysis of possible CpG sites is the aim of the study, M-values can be used for inference

    Change point detection for clustered expression data

    Get PDF
    Background: To detect changes in biological processes, samples are often studied at several time points. We examined expression data measured at different developmental stages, or more broadly, historical data. Hence, the main assumption of our proposed methodology was the independence between the examined samples over time. In addition, however, the examinations were clustered at each time point by measuring littermates from relatively few mother mice at each developmental stage. As each examination was lethal, we had an independent data structure over the entire history, but a dependent data structure at a particular time point. Over the course of these historical data, we wanted to identify abrupt changes in the parameter of interest - change points. Results: In this study, we demonstrated the application of generalized hypothesis testing using a linear mixed effects model as a possible method to detect change points. The coefficients from the linear mixed model were used in multiple contrast tests and the effect estimates were visualized with their respective simultaneous confidence intervals. The latter were used to determine the change point(s). In small simulation studies, we modelled different courses with abrupt changes and compared the influence of different contrast matrices. We found two contrasts, both capable of answering different research questions in change point detection: The Sequen contrast to detect individual change points and the McDermott contrast to find change points due to overall progression. We provide the R code for direct use with provided examples. The applicability of those tests for real experimental data was shown with in-vivo data from a preclinical study. Conclusion: Simultaneous confidence intervals estimated by multiple contrast tests using the model fit from a linear mixed model were capable to determine change points in clustered expression data. The confidence intervals directly delivered interpretable effect estimates representing the strength of the potential change point. Hence, scientists can define biologically relevant threshold of effect strength depending on their research question. We found two rarely used contrasts best fitted for detection of a possible change point: the Sequen and McDermott contrasts

    Emergence delirium in children is not related to intraoperative burst suppression – prospective, observational electrography study

    Get PDF
    BACKGROUND: Emergence-delirium is the most frequent brain dysfunction in children recovering from general anaesthesia, though the pathophysiological background remains unclear. The presented study analysed an association between emergence delirium and intraoperative Burst Suppression activity in the electroencephalogram, a period of very deep hypnosis during general anaesthesia. METHODS: In this prospective, observational cohort study at the Charité - university hospital in Berlin / Germany children aged 0.5 to 8 years, undergoing planned surgery, were included between September 2015 and February 2017. Intraoperative bi-frontal electroencephalograms were recorded. Occurrence and duration of Burst Suppression periods were visually analysed. Emergence delirium was assessed using the Pediatric Assessment of Emergence Delirium Score. RESULTS: From 97 children being analysed within this study, 40 children developed emergence delirium, and 57 children did not. Overall 52% of the children displayed intraoperative Burst Suppression periods; however, occurrence and duration of Burst Suppression (Emergence delirium group 55% / 261 + 462 s vs. Non-emergence delirium group 49% / 318 + 531 s) did not differ significantly between both groups. CONCLUSIONS: Our data reveal no correlation between the occurrence and duration of intraoperative Burst Suppression activity and the incidence of emergence delirium. Burst Suppression occurrence is frequent; however, it does not seem to have an unfavourable impact on cerebral function at emergence from general anaesthesia in children

    Human airway epithelial extracellular vesicle miRNA signature is altered upon asthma development

    Get PDF
    Background: miRNAs are master regulators of signaling pathways critically involved in asthma and are transferred between cells in extracellular vesicles (EV). We aimed to investigate whether the miRNA content of EV secreted by primary normal human bronchial epithelial cells (NHBE) is altered upon asthma development. Methods: NHBE cells were cultured at air-liquid interface and treated with interleukin (IL)-13 to induce an asthma-like phenotype. EV isolations by precipitation from basal culture medium or apical surface wash were characterized by nanoparticle tracking analysis, transmission electron microscopy, and Western blot, and EV-associated miRNAs were identified by a RT-qPCR-based profiling. Significant candidates were confirmed in EVs isolated by size-exclusion chromatography from nasal lavages of children with mild-to-moderate (n = 8) or severe asthma (n = 9), and healthy controls (n = 9). Results: NHBE cells secrete EVs to the apical and basal side. 47 miRNAs were expressed in EVs and 16 thereof were significantly altered in basal EV upon IL-13 treatment. Expression of miRNAs could be confirmed in EVs from human nasal lavages. Of note, levels of miR-92b, miR-210, and miR-34a significantly correlated with lung function parameters in children (FEV1FVC%pred and FEF25-75%pred), thus lower sEV-miRNA levels in nasal lavages associated with airway obstruction. Subsequent ingenuity pathway analysis predicted the miRNAs to regulate Th2 polarization and dendritic cell maturation. Conclusion: Our data indicate that secretion of miRNAs in EVs from the airway epithelium, in particular miR-34a, miR-92b, and miR-210, might be involved in the early development of a Th2 response in the airways and asthma

    A comparison of first-attempt cannulation success of peripheral venous catheter systems with and without wings and injection ports in surgical patients—a randomized trial

    Get PDF
    Background: A peripheral venous catheter (PVC) is the most widely used device for obtaining vascular access, allowing the administration of fluids and medication. Up to 25% of adult patients, and 50% of pediatric patients experience a first-attempt cannulation failure. In addition to patient and clinician characteristics, device features might affect the handling and success rates. The objective of the study was to compare the first-attempt cannulation success rate between PVCs with wings and a port access (Vasofix (R) Safety, B. Braun, abbreviated hereon in as VS) with those without (Introcan (R) Safety, B. Braun, abbreviated hereon in as IS) in an anesthesiological cohort. Methods: An open label, multi-center, randomized trial was performed. First-attempt cannulation success rates were examined, along with relevant patient, clinician, and device characteristics with univariate and multivariate analyses. Information on handling and adherence to use instructions was gathered, and available catheters were assessed for damage. Results: Two thousand three hundred four patients were included in the intention to treat analysis. First-attempt success rate was significantly higher with winged and ported catheters (VS) than with the non-winged, non-ported design (IS) (87.5% with VS vs. 78.2% with IS; P-Chi < .001). Operators rated the handling of VS as superior (rating of"good" or"very good: 86.1% VS vs. 20.8% IS, P-Chi <.001). Reinsertion of the needle into the catheter after partial withdrawal-prior or during the catheterization attempt-was associated with an increased risk of cannulation failure (7.909, CI 5.989-10.443, P < .001 and 23.023, CI 10.372-51.105, P < .001, respectively) and a twofold risk of catheter damage (OR 1.999, CI 1.347-2.967, P = .001). Conclusions: First-attempt cannulation success of peripheral, ported, winged catheters was higher compared to non-ported, non-winged devices. The handling of the winged and ported design was better rated by the clinicians. Needle reinsertions are related to an increase in rates of catheter damage and cannulation failure

    Preoperative medication use and development of postoperative delirium and cognitive dysfunction

    Get PDF
    Postoperative delirium (POD) and postoperative (neuro-)cognitive disorder (POCD) are frequent and serious complications after operations. We aim to investigate the association between pre-operative polypharmacy and potentially inappropriate medications and the development of POD/POCD in elderly patients. This investigation is part of the European BioCog project (www.biocog.eu), a prospective multicenter observational study with elderly surgical patients. Patients with a Mini-Mental State Examination score less than or equal to 23 points were excluded. POD was assessed up to 7 days after surgery using the Nursing Delirium Screening Scale, Confusion Assessment Method (for the intensive care unit [ICU]), and a patient chart review. POCD was assessed 3 months after surgery with a neuropsychological test battery. Pre-operative long-term medication was evaluated in terms of polypharmacy (≥5 agents) and potentially inappropriate medication (defined by the PRISCUS and European list of potentially inappropriate medications [EU(7)-PIM] lists), and associations with POD and POCD were analyzed using logistic regression analysis. Eight hundred thirty-seven participants were included for analysis of POD and 562 participants for POCD. Of these, 165 patients (19.7%) fulfilled the criteria of POD and 60 (10.7%) for POCD. After adjusting for confounders, pre-operative polypharmacy and intake of potentially inappropriate medications could not be shown to be associated with the development of POD nor POCD. We found no associations between pre-operative polypharmacy and potentially inappropriate medications and development of POD and POCD. Future studies should focus on the evaluation of drug interactions to determine whether patients benefit from a pre-operative adjustment
    corecore