390 research outputs found
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies.
FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects.
DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects.
METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data
On-line mass spectrometry: membrane inlet sampling
Significant insights into plant photosynthesis and respiration have been achieved using membrane inlet mass spectrometry (MIMS) for the analysis of stable isotope distribution of gases. The MIMS approach is based on using a gas permeable membrane to enable the entry of gas molecules into the mass spectrometer source. This is a simple yet durable approach for the analysis of volatile gases, particularly atmospheric gases. The MIMS technique strongly lends itself to the study of reaction flux where isotopic labeling is employed to differentiate two competing processes; i.e., O2 evolution versus O2 uptake reactions from PSII or terminal oxidase/rubisco reactions. Such investigations have been used for in vitro studies of whole leaves and isolated cells. The MIMS approach is also able to follow rates of isotopic exchange, which is useful for obtaining chemical exchange rates. These types of measurements have been employed for oxygen ligand exchange in PSII and to discern reaction rates of the carbonic anhydrase reactions. Recent developments have also engaged MIMS for online isotopic fractionation and for the study of reactions in inorganic systems that are capable of water splitting or H2 generation. The simplicity of the sampling approach coupled to the high sensitivity of modern instrumentation is a reason for the growing applicability of this technique for a range of problems in plant photosynthesis and respiration. This review offers some insights into the sampling approaches and the experiments that have been conducted with MIMS
Dietary Supplements and Sports Performance: Introduction and Vitamins
Sports success is dependent primarily on genetic endowment in athletes with morphologic, psychologic, physiologic and metabolic traits specific to performance characteristics vital to their sport. Such genetically-endowed athletes must also receive optimal training to increase physical power, enhance mental strength, and provide a mechanical advantage. However, athletes often attempt to go beyond training and use substances and techniques, often referred to as ergogenics, in attempts to gain a competitive advantage. Pharmacological agents, such as anabolic steroids and amphetamines, have been used in the past, but such practices by athletes have led to the establishment of anti-doping legislation and effective testing protocols to help deter their use. Thus, many athletes have turned to various dietary strategies, including the use of various dietary supplements (sports supplements), which they presume to be effective, safe and legal
The consensus molecular subtypes of colorectal cancer
Colorectal cancer (CRC) is a frequently lethal disease with heterogeneous outcomes and drug responses. To resolve inconsistencies among the reported gene expression-based CRC classifications and facilitate clinical translation, we formed an international consortium dedicated to large-scale data sharing and analytics across expert groups. We show marked interconnectivity between six independent classification systems coalescing into four consensus molecular subtypes (CMSs) with distinguishing features: CMS1 (microsatellite instability immune, 14%), hypermutated, microsatellite unstable and strong immune activation; CMS2 (canonical, 37%), epithelial, marked WNT and MYC signaling activation; CMS3 (metabolic, 13%), epithelial and evident metabolic dysregulation; and CMS4 (mesenchymal, 23%), prominent transforming growth factor-beta activation, stromal invasion and angiogenesis. Samples with mixed features (13%) possibly represent a transition phenotype or intratumoral heterogeneity. We consider the CMS groups the most robust classification system currently available for CRC-with clear biological interpretability-and the basis for future clinical stratification and subtype-based targeted interventions
Comparison of performance of the Assessment of Spondyloarthritis International Society, the European Spondyloarthropathy Study Group and the modified New York criteria in a cohort of Chinese patients with spondyloarthritis
Early diagnosis of spondyloarthritis (SpA) is essential as anti-tumor necrosis factor therapy can achieve significant symptomatic relief and control of disease activity. This study aims to compare the clinical characteristics, disease activity, and functional status of a Chinese cohort of SpA patients who were re-classified into ankylosing spondylitis (AS) patients fulfilling the modified New York (MNY) criteria, those with undifferentiated SpA (USpA) fulfilling the European Spondyloarthropathy Study Group (ESSG) classification criteria only (USpA/ESSG) and those who fulfill Assessment of SpondyloArthritis International Society (ASAS) only (USpA/ASAS). Disease activity was evaluated by Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), severity of morning stiffness, patient global assessment, and C-reactive protein. Functional status was evaluated by Bath Ankylosing Spondylitis Functional Index (BASFI), modified Schober index, and dimension of chest expansion. One hundred and twenty-eight patients with disease duration of 16.3 ± 10.4 years were recruited. Patients in USpA/ESSG and USpA/ASAS were significantly younger (p = 0.01), had shorter disease duration (p < 0.01), and lower BASFI (p = 0.03) than established AS patients. All three groups have active disease with comparable BASDAI >3. BASFI correlated inversely with dimension of chest expansion and negatively modified Schober index in AS patients (p < 0.01) and modestly with BASDAI (r = 0.25, p < 0.01). BASFI correlated moderately with BASDAI in USpA/ESSG (r = 0.61, p < 0.01) but not with chest expansion or modified Schober index. Compared with established AS patients recognized by MNY criteria, patients fulfilling USpA defined by ESSG or ASAS criteria had earlier disease, as active disease and less irreversible functional deficit
Towards an automated data cleaning with deep learning in CRESST
The CRESST experiment employs cryogenic calorimeters for the sensitive
measurement of nuclear recoils induced by dark matter particles. The recorded
signals need to undergo a careful cleaning process to avoid wrongly
reconstructed recoil energies caused by pile-up and read-out artefacts. We
frame this process as a time series classification task and propose to automate
it with neural networks. With a data set of over one million labeled records
from 68 detectors, recorded between 2013 and 2019 by CRESST, we test the
capability of four commonly used neural network architectures to learn the data
cleaning task. Our best performing model achieves a balanced accuracy of 0.932
on our test set. We show on an exemplary detector that about half of the
wrongly predicted events are in fact wrongly labeled events, and a large share
of the remaining ones have a context-dependent ground truth. We furthermore
evaluate the recall and selectivity of our classifiers with simulated data. The
results confirm that the trained classifiers are well suited for the data
cleaning task.Comment: 12 pages, 8 figures, 6 table
Latest observations on the low energy excess in CRESST-III
The CRESST experiment observes an unexplained excess of events at low
energies. In the current CRESST-III data-taking campaign we are operating
detector modules with different designs to narrow down the possible
explanations. In this work, we show first observations of the ongoing
measurement, focusing on the comparison of time, energy and temperature
dependence of the excess in several detectors. These exclude dark matter,
radioactive backgrounds and intrinsic sources related to the crystal bulk as a
major contribution.Comment: 10 pages, 5 figures; to be published in IDM2022 proceeding
Observation of a low energy nuclear recoil peak in the neutron calibration data of the CRESST-III Experiment
New-generation direct searches for low mass dark matter feature detection
thresholds at energies well below 100 eV, much lower than the energies of
commonly used X-ray calibration sources. This requires new calibration sources
with sub-keV energies. When searching for nuclear recoil signals, the
calibration source should ideally cause mono-energetic nuclear recoils in the
relevant energy range. Recently, a new calibration method based on the
radiative neutron capture on W with subsequent de-excitation via single
-emission leading to a nuclear recoil peak at 112 eV was proposed. The
CRESST-III dark matter search operated several CaWO-based detector
modules with detection thresholds below 100 eV in the past years. We report the
observation of a peak around the expected energy of 112 eV in the data of three
different detector modules recorded while irradiated with neutrons from
different AmBe calibration sources. We compare the properties of the observed
peaks with Geant-4 simulations and assess the prospects of using this for the
energy calibration of CRESST-III detectors.Comment: 8 pages, 4 figures; submitted to Phys. Rev.
- …