390 research outputs found

    Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.

    Get PDF
    BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data

    On-line mass spectrometry: membrane inlet sampling

    Get PDF
    Significant insights into plant photosynthesis and respiration have been achieved using membrane inlet mass spectrometry (MIMS) for the analysis of stable isotope distribution of gases. The MIMS approach is based on using a gas permeable membrane to enable the entry of gas molecules into the mass spectrometer source. This is a simple yet durable approach for the analysis of volatile gases, particularly atmospheric gases. The MIMS technique strongly lends itself to the study of reaction flux where isotopic labeling is employed to differentiate two competing processes; i.e., O2 evolution versus O2 uptake reactions from PSII or terminal oxidase/rubisco reactions. Such investigations have been used for in vitro studies of whole leaves and isolated cells. The MIMS approach is also able to follow rates of isotopic exchange, which is useful for obtaining chemical exchange rates. These types of measurements have been employed for oxygen ligand exchange in PSII and to discern reaction rates of the carbonic anhydrase reactions. Recent developments have also engaged MIMS for online isotopic fractionation and for the study of reactions in inorganic systems that are capable of water splitting or H2 generation. The simplicity of the sampling approach coupled to the high sensitivity of modern instrumentation is a reason for the growing applicability of this technique for a range of problems in plant photosynthesis and respiration. This review offers some insights into the sampling approaches and the experiments that have been conducted with MIMS

    Dietary Supplements and Sports Performance: Introduction and Vitamins

    Get PDF
    Sports success is dependent primarily on genetic endowment in athletes with morphologic, psychologic, physiologic and metabolic traits specific to performance characteristics vital to their sport. Such genetically-endowed athletes must also receive optimal training to increase physical power, enhance mental strength, and provide a mechanical advantage. However, athletes often attempt to go beyond training and use substances and techniques, often referred to as ergogenics, in attempts to gain a competitive advantage. Pharmacological agents, such as anabolic steroids and amphetamines, have been used in the past, but such practices by athletes have led to the establishment of anti-doping legislation and effective testing protocols to help deter their use. Thus, many athletes have turned to various dietary strategies, including the use of various dietary supplements (sports supplements), which they presume to be effective, safe and legal

    The consensus molecular subtypes of colorectal cancer

    Get PDF
    Colorectal cancer (CRC) is a frequently lethal disease with heterogeneous outcomes and drug responses. To resolve inconsistencies among the reported gene expression-based CRC classifications and facilitate clinical translation, we formed an international consortium dedicated to large-scale data sharing and analytics across expert groups. We show marked interconnectivity between six independent classification systems coalescing into four consensus molecular subtypes (CMSs) with distinguishing features: CMS1 (microsatellite instability immune, 14%), hypermutated, microsatellite unstable and strong immune activation; CMS2 (canonical, 37%), epithelial, marked WNT and MYC signaling activation; CMS3 (metabolic, 13%), epithelial and evident metabolic dysregulation; and CMS4 (mesenchymal, 23%), prominent transforming growth factor-beta activation, stromal invasion and angiogenesis. Samples with mixed features (13%) possibly represent a transition phenotype or intratumoral heterogeneity. We consider the CMS groups the most robust classification system currently available for CRC-with clear biological interpretability-and the basis for future clinical stratification and subtype-based targeted interventions

    Comparison of performance of the Assessment of Spondyloarthritis International Society, the European Spondyloarthropathy Study Group and the modified New York criteria in a cohort of Chinese patients with spondyloarthritis

    Get PDF
    Early diagnosis of spondyloarthritis (SpA) is essential as anti-tumor necrosis factor therapy can achieve significant symptomatic relief and control of disease activity. This study aims to compare the clinical characteristics, disease activity, and functional status of a Chinese cohort of SpA patients who were re-classified into ankylosing spondylitis (AS) patients fulfilling the modified New York (MNY) criteria, those with undifferentiated SpA (USpA) fulfilling the European Spondyloarthropathy Study Group (ESSG) classification criteria only (USpA/ESSG) and those who fulfill Assessment of SpondyloArthritis International Society (ASAS) only (USpA/ASAS). Disease activity was evaluated by Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), severity of morning stiffness, patient global assessment, and C-reactive protein. Functional status was evaluated by Bath Ankylosing Spondylitis Functional Index (BASFI), modified Schober index, and dimension of chest expansion. One hundred and twenty-eight patients with disease duration of 16.3 ± 10.4 years were recruited. Patients in USpA/ESSG and USpA/ASAS were significantly younger (p = 0.01), had shorter disease duration (p < 0.01), and lower BASFI (p = 0.03) than established AS patients. All three groups have active disease with comparable BASDAI >3. BASFI correlated inversely with dimension of chest expansion and negatively modified Schober index in AS patients (p < 0.01) and modestly with BASDAI (r = 0.25, p < 0.01). BASFI correlated moderately with BASDAI in USpA/ESSG (r = 0.61, p < 0.01) but not with chest expansion or modified Schober index. Compared with established AS patients recognized by MNY criteria, patients fulfilling USpA defined by ESSG or ASAS criteria had earlier disease, as active disease and less irreversible functional deficit

    Towards an automated data cleaning with deep learning in CRESST

    Full text link
    The CRESST experiment employs cryogenic calorimeters for the sensitive measurement of nuclear recoils induced by dark matter particles. The recorded signals need to undergo a careful cleaning process to avoid wrongly reconstructed recoil energies caused by pile-up and read-out artefacts. We frame this process as a time series classification task and propose to automate it with neural networks. With a data set of over one million labeled records from 68 detectors, recorded between 2013 and 2019 by CRESST, we test the capability of four commonly used neural network architectures to learn the data cleaning task. Our best performing model achieves a balanced accuracy of 0.932 on our test set. We show on an exemplary detector that about half of the wrongly predicted events are in fact wrongly labeled events, and a large share of the remaining ones have a context-dependent ground truth. We furthermore evaluate the recall and selectivity of our classifiers with simulated data. The results confirm that the trained classifiers are well suited for the data cleaning task.Comment: 12 pages, 8 figures, 6 table

    Latest observations on the low energy excess in CRESST-III

    Full text link
    The CRESST experiment observes an unexplained excess of events at low energies. In the current CRESST-III data-taking campaign we are operating detector modules with different designs to narrow down the possible explanations. In this work, we show first observations of the ongoing measurement, focusing on the comparison of time, energy and temperature dependence of the excess in several detectors. These exclude dark matter, radioactive backgrounds and intrinsic sources related to the crystal bulk as a major contribution.Comment: 10 pages, 5 figures; to be published in IDM2022 proceeding

    Observation of a low energy nuclear recoil peak in the neutron calibration data of the CRESST-III Experiment

    Full text link
    New-generation direct searches for low mass dark matter feature detection thresholds at energies well below 100 eV, much lower than the energies of commonly used X-ray calibration sources. This requires new calibration sources with sub-keV energies. When searching for nuclear recoil signals, the calibration source should ideally cause mono-energetic nuclear recoils in the relevant energy range. Recently, a new calibration method based on the radiative neutron capture on 182^{182}W with subsequent de-excitation via single γ\gamma-emission leading to a nuclear recoil peak at 112 eV was proposed. The CRESST-III dark matter search operated several CaWO4_{4}-based detector modules with detection thresholds below 100 eV in the past years. We report the observation of a peak around the expected energy of 112 eV in the data of three different detector modules recorded while irradiated with neutrons from different AmBe calibration sources. We compare the properties of the observed peaks with Geant-4 simulations and assess the prospects of using this for the energy calibration of CRESST-III detectors.Comment: 8 pages, 4 figures; submitted to Phys. Rev.
    corecore