14 research outputs found

    Detection of suspicious interactions of spiking covariates in methylation data

    Get PDF
    BACKGROUND: In methylation analyses like epigenome-wide association studies, a high amount of biomarkers is tested for an association between the measured continuous outcome and different covariates. In the case of a continuous covariate like smoking pack years (SPY), a measure of lifetime exposure to tobacco toxins, a spike at zero can occur. Hence, all non-smokers are generating a peak at zero, while the smoking patients are distributed over the other SPY values. Additionally, the spike might also occur on the right side of the covariate distribution, if a category "heavy smoker" is designed. Here, we will focus on methylation data with a spike at the left or the right of the distribution of a continuous covariate. After the methylation data is generated, analysis is usually performed by preprocessing, quality control, and determination of differentially methylated sites, often performed in pipeline fashion. Hence, the data is processed in a string of methods, which are available in one software package. The pipelines can distinguish between categorical covariates, i.e. for group comparisons or continuous covariates, i.e. for linear regression. The differential methylation analysis is often done internally by a linear regression without checking its inherent assumptions. A spike in the continuous covariate is ignored and can cause biased results. RESULTS: We have reanalysed five data sets, four freely available from ArrayExpress, including methylation data and smoking habits reported by smoking pack years. Therefore, we generated an algorithm to check for the occurrences of suspicious interactions between the values associated with the spike position and the non-spike positions of the covariate. Our algorithm helps to decide if a suspicious interaction can be found and further investigations should be carried out. This is mostly important, because the information on the differentially methylated sites will be used for post-hoc analyses like pathway analyses. CONCLUSIONS: We help to check for the validation of the linear regression assumptions in a methylation analysis pipeline. These assumptions should also be considered for machine learning approaches. In addition, we are able to detect outliers in the continuous covariate. Therefore, more statistical robust results should be produced in methylation analysis using our algorithm as a preprocessing step

    Estimands in epigenome-wide association studies

    Get PDF
    Background: In DNA methylation analyses like epigenome-wide association studies, effects in differentially methylated CpG sites are assessed. Two kinds of outcomes can be used for statistical analysis: Beta-values and M-values. M-values follow a normal distribution and help to detect differentially methylated CpG sites. As biological effect measures, differences of M-values are more or less meaningless. Beta-values are of more interest since they can be interpreted directly as differences in percentage of DNA methylation at a given CpG site, but they have poor statistical properties. Different frameworks are proposed for reporting estimands in DNA methylation analysis, relying on Beta-values, M-values, or both. Results: We present and discuss four possible approaches of achieving estimands in DNA methylation analysis. In addition, we present the usage of M-values or Beta-values in the context of bioinformatical pipelines, which often demand a predefined outcome. We show the dependencies between the differences in M-values to differences in Beta-values in two data simulations: a analysis with and without confounder effect. Without present confounder effects, M-values can be used for the statistical analysis and Beta-values statistics for the reporting. If confounder effects exist, we demonstrate the deviations and correct the effects by the intercept method. Finally, we demonstrate the theoretical problem on two large human genome-wide DNA methylation datasets to verify the results. Conclusions: The usage of M-values in the analysis of DNA methylation data will produce effect estimates, which cannot be biologically interpreted. The parallel usage of Beta-value statistics ignores possible confounder effects and can therefore not be recommended. Hence, if the differences in Beta-values are the focus of the study, the intercept method is recommendable. Hyper- or hypomethylated CpG sites must then be carefully evaluated. If an exploratory analysis of possible CpG sites is the aim of the study, M-values can be used for inference

    Change point detection for clustered expression data

    Get PDF
    Background: To detect changes in biological processes, samples are often studied at several time points. We examined expression data measured at different developmental stages, or more broadly, historical data. Hence, the main assumption of our proposed methodology was the independence between the examined samples over time. In addition, however, the examinations were clustered at each time point by measuring littermates from relatively few mother mice at each developmental stage. As each examination was lethal, we had an independent data structure over the entire history, but a dependent data structure at a particular time point. Over the course of these historical data, we wanted to identify abrupt changes in the parameter of interest - change points. Results: In this study, we demonstrated the application of generalized hypothesis testing using a linear mixed effects model as a possible method to detect change points. The coefficients from the linear mixed model were used in multiple contrast tests and the effect estimates were visualized with their respective simultaneous confidence intervals. The latter were used to determine the change point(s). In small simulation studies, we modelled different courses with abrupt changes and compared the influence of different contrast matrices. We found two contrasts, both capable of answering different research questions in change point detection: The Sequen contrast to detect individual change points and the McDermott contrast to find change points due to overall progression. We provide the R code for direct use with provided examples. The applicability of those tests for real experimental data was shown with in-vivo data from a preclinical study. Conclusion: Simultaneous confidence intervals estimated by multiple contrast tests using the model fit from a linear mixed model were capable to determine change points in clustered expression data. The confidence intervals directly delivered interpretable effect estimates representing the strength of the potential change point. Hence, scientists can define biologically relevant threshold of effect strength depending on their research question. We found two rarely used contrasts best fitted for detection of a possible change point: the Sequen and McDermott contrasts

    Effects of COVID-19 Pandemic on Progress Test Performance in German-Speaking Countries

    Get PDF
    Background. The COVID-19 pandemic has been the source of many challenges for medical students worldwide. The authors examined short-term effects on the knowledge gain of medical students in German-speaking countries. Methods. The development of the knowledge gain of medical students during the pandemic was measured by comparing the outcomes of shared questions within Berlin Progress Test (PT) pairs. The PT is a formative test of 200 multiple choice questions at the graduate level, which provides feedback to students on knowledge and knowledge gain during their course of study. It is provided to about 11,000 students in Germany and Austria around the beginning of each semester. We analyzed three successive test pairs: PT36-PT41 (both conducted before the pandemic), PT37-PT42 (PT37 took place before the pandemic; PT42 was conducted from April 2020 onwards), and PT38-PT43 (PT38 was administered before the pandemic; PT43 started in November 2020). The authors used mixed-effect regression models and compared the absolute variations in the percentage of correct answers per subject. Results. The most recent test of each PT pair showed a higher mean score compared to the previous test in the same pair (PT36-PT41 : 2.53 (95% CI: 1.31-3.75), PT37-PT42 : 3.72 (2.57-4.88), and PT38-PT43 : 5.66 (4.63-6.69)). Analogously, an increase in the share of correct answers was observed for most medical disciplines, with Epidemiology showing the most remarkable upsurge. Conclusions. Overall, PT performance improved during the pandemic, which we take as an indication that the sudden shift to online learning did not have a negative effect on the knowledge gain of students. We consider that these results may be helpful in advancing innovative approaches to medical education

    Postoperative Anaemia Might Be a Risk Factor for Postoperative Delirium and Prolonged Hospital Stay: A Secondary Analysis of a Prospective Cohort Study

    Get PDF
    Background: Postoperative anaemia is a frequent surgical complication and in contrast to preoperative anaemia has not been validated in relation to mortality, morbidity and its associated health economic effect. Postoperative anaemia can predispose postoperative delirium through impairment of cerebral oxygenation. The aim of this secondary analysis is to investigate the association of postoperative anaemia in accordance with the sex specific World Health Organization definition of anaemia to postoperative delirium and its impact on the duration of hospital stay. Methods: A secondary analysis of the prospective multicentric observational CESARO-study was conducted. 800 adult patients undergoing elective surgery were enrolled from various operative disciplines across seven hospitals ranging from university hospitals, district general hospitals to specialist clinics of minimally invasive surgery in Germany. Patients were classified as anaemic according to the World Health Organization parameters, setting the haemoglobin level cut off below 12g/dl for females and below 13g/dl for males. Focus of the investigation were patients with acute anaemia. Patients with present preoperative anaemia or missing haemoglobin measurement were excluded from the sample set. Delirium screening was established postoperatively for at least 24 hours and up to three days, applying the validated Nursing Delirium Screening Scale. Results: The initial sample set contained 800 patients of which 183 were suitable for analysis in the study. Ninety out of 183 (49.2%) suffered from postoperative anaemia. Ten out of 93 (10.9%) patients without postoperative anaemia developed a postoperative delirium. In the group with postoperative anaemia, 28 (38.4%) out of 90 patients suffered from postoperative delirium (odds ratio 3.949, 95% confidence interval, (1.358-11.480)) after adjustment for NYHA-stadium, severity of surgery, cutting/suture time, duration of anaesthesia, transfusion of packed red cells and sedation status with Richmond Agitation Scale after surgery. Additionally, patients who suffered from postoperative anaemia showed a significantly longer duration of hospitalisation (7.75 vs. 12.42 days, odds ratio = 1.186, 95% confidence interval, 1.083-1.299, after adjustments). Conclusion: The study results reveal that postoperative anaemia is not only a frequent postsurgical complication with an incidence probability of almost 50%, but could also be associated with a postoperative delirium and a prolonged hospitalisation

    OPEN MERIT: QUEST MERIT App Open

    No full text
    OPEN MERIT is an open-source software development project. The QUEST MERIT App Open is the prototype application that supports academic organizations in providing structured, quality-oriented and fair research assessments for hiring senior scientific staff and appointments. The QUEST MERIT App open can be adapted for different institutional requirements

    What statistics instructors need to know about concept acquisition to make statistics stick

    Get PDF
    The limits of my language are the limits of my mind. All I know is what I have words for (Wittgenstein). When learning something completely new, we connect the unknown term to an already existing part of our knowledge. We can only build new ideas and insights upon an existing conceptual foundation. In the field of statistics, we educators frequently find ourselves met with great confusion when teaching novices. These students, entirely unfamiliar with even basic statistics, must connect the introduced statistical terms within their personal existing networks of largely non-statistical knowledge. Lecturers, on the other hand, who are well versed in statistics, have deeply internalized the content to be taught and its relevant context. The juxtaposition of the two roles may produce amusement in a lecturer upon gaining insight into the word associations made by the statistical novices. For example, a ‘logistic regression’ does not involve the ‘shipping of goods in economically difficult times,’ though this might seem entirely reasonable and intuitive to the statistics learner. Other times, these different perspectives can lead to headaches and frustration for both learners and their lecturers. In this article, we illustrate how simple statistical terms are initially connected to a student’s pre-exiting knowledge and how these associations change after completing an introductory course in applied statistics. Furthermore, we emphasize the important difference between “term”, “approach”, and “context”. Understanding this fundamental distinction may help improve the communication between the lecturer and the learner. We offer a collection of practical tools for instructors to help promote students’ conceptual understanding in a supportive, mutually-beneficial learning environment

    Discovering unknown response patterns in progress test data to improve the estimation of student performance

    No full text
    Abstract Background The Progress Test Medizin (PTM) is a 200-question formative test that is administered to approximately 11,000 students at medical universities (Germany, Austria, Switzerland) each term. Students receive feedback on their knowledge (development) mostly in comparison to their own cohort. In this study, we use the data of the PTM to find groups with similar response patterns. Methods We performed k-means clustering with a dataset of 5,444 students, selected cluster number k = 5, and answers as features. Subsequently, the data was passed to XGBoost with the cluster assignment as target enabling the identification of cluster-relevant questions for each cluster with SHAP. Clusters were examined by total scores, response patterns, and confidence level. Relevant questions were evaluated for difficulty index, discriminatory index, and competence levels. Results Three of the five clusters can be seen as “performance” clusters: cluster 0 (n = 761) consisted predominantly of students close to graduation. Relevant questions tend to be difficult, but students answered confidently and correctly. Students in cluster 1 (n = 1,357) were advanced, cluster 3 (n = 1,453) consisted mainly of beginners. Relevant questions for these clusters were rather easy. The number of guessed answers increased. There were two “drop-out” clusters: students in cluster 2 (n = 384) dropped out of the test about halfway through after initially performing well; cluster 4 (n = 1,489) included students from the first semesters as well as “non-serious” students both with mostly incorrect guesses or no answers. Conclusion Clusters placed performance in the context of participating universities. Relevant questions served as good cluster separators and further supported our “performance” cluster groupings

    Association between genetic variants of the cholinergic system and postoperative delirium and cognitive dysfunction in elderly patients

    Get PDF
    Background Postoperative delirium (POD) and postoperative cognitive dysfunction (POCD) are frequent and serious complications after surgery. We aim to investigate the association between genetic variants in cholinergic candidate genes according to the Kyoto encyclopedia of genes and genomes - pathway: cholinergic neurotransmission with the development of POD or POCD in elderly patients. Methods This analysis is part of the European BioCog project (), a prospective multicenter observational study with elderly surgical patients. Patients with a Mini-Mental-State-Examination score <= 23 points were excluded. POD was assessed up to seven days after surgery using the Nursing Delirium Screening Scale, Confusion Assessment Method and a patient chart review. POCD was assessed three months after surgery with a neuropsychological test battery. Genotyping was performed on the Illumina Infinium Global Screening Array. Associations with POD and POCD were analyzed using logistic regression analysis, adjusted for age, comorbidities and duration of anesthesia (for POCD analysis additionally for education). Odds ratios (OR) refer to minor allele counts (0, 1, 2). Results 745 patients could be included in the POD analysis, and 452 in the POCD analysis. The rate of POD within this group was 20.8% (155 patients), and the rate of POCD was 10.2% (46 patients). In a candidate gene approach three genetic variants of the cholinergic genes CHRM2 and CHRM4 were associated with POD (OR [95% confidence interval], rs8191992: 0.61[0.46; 0.80]; rs8191992: 1.60[1.22; 2.09]; rs2067482: 1.64[1.10; 2.44]). No associations were found for POCD. Conclusions We found an association between genetic variants of CHRM2 and CHRM4 and POD. Further studies are needed to investigate whether disturbances in acetylcholine release and synaptic plasticity are involved in the development of POD. Trial registration: ClinicalTrials.gov: NCT02265263
    corecore