825 research outputs found

    Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.

    Get PDF
    BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data

    Single hadron response measurement and calorimeter jet energy scale uncertainty with the ATLAS detector at the LHC

    Get PDF
    The uncertainty on the calorimeter energy response to jets of particles is derived for the ATLAS experiment at the Large Hadron Collider (LHC). First, the calorimeter response to single isolated charged hadrons is measured and compared to the Monte Carlo simulation using proton-proton collisions at centre-of-mass energies of sqrt(s) = 900 GeV and 7 TeV collected during 2009 and 2010. Then, using the decay of K_s and Lambda particles, the calorimeter response to specific types of particles (positively and negatively charged pions, protons, and anti-protons) is measured and compared to the Monte Carlo predictions. Finally, the jet energy scale uncertainty is determined by propagating the response uncertainty for single charged and neutral particles to jets. The response uncertainty is 2-5% for central isolated hadrons and 1-3% for the final calorimeter jet energy scale.Comment: 24 pages plus author list (36 pages total), 23 figures, 1 table, submitted to European Physical Journal

    Standalone vertex finding in the ATLAS muon spectrometer

    Get PDF
    A dedicated reconstruction algorithm to find decay vertices in the ATLAS muon spectrometer is presented. The algorithm searches the region just upstream of or inside the muon spectrometer volume for multi-particle vertices that originate from the decay of particles with long decay paths. The performance of the algorithm is evaluated using both a sample of simulated Higgs boson events, in which the Higgs boson decays to long-lived neutral particles that in turn decay to bbar b final states, and pp collision data at √s = 7 TeV collected with the ATLAS detector at the LHC during 2011

    Measurements of Higgs boson production and couplings in diboson final states with the ATLAS detector at the LHC

    Get PDF
    Measurements are presented of production properties and couplings of the recently discovered Higgs boson using the decays into boson pairs, H →γ γ, H → Z Z∗ →4l and H →W W∗ →lνlν. The results are based on the complete pp collision data sample recorded by the ATLAS experiment at the CERN Large Hadron Collider at centre-of-mass energies of √s = 7 TeV and √s = 8 TeV, corresponding to an integrated luminosity of about 25 fb−1. Evidence for Higgs boson production through vector-boson fusion is reported. Results of combined fits probing Higgs boson couplings to fermions and bosons, as well as anomalous contributions to loop-induced production and decay modes, are presented. All measurements are consistent with expectations for the Standard Model Higgs boson

    Measurement of the top quark-pair production cross section with ATLAS in pp collisions at \sqrt{s}=7\TeV

    Get PDF
    A measurement of the production cross-section for top quark pairs(\ttbar) in pppp collisions at \sqrt{s}=7 \TeV is presented using data recorded with the ATLAS detector at the Large Hadron Collider. Events are selected in two different topologies: single lepton (electron ee or muon μ\mu) with large missing transverse energy and at least four jets, and dilepton (eeee, μμ\mu\mu or eμe\mu) with large missing transverse energy and at least two jets. In a data sample of 2.9 pb-1, 37 candidate events are observed in the single-lepton topology and 9 events in the dilepton topology. The corresponding expected backgrounds from non-\ttbar Standard Model processes are estimated using data-driven methods and determined to be 12.2±3.912.2 \pm 3.9 events and 2.5±0.62.5 \pm 0.6 events, respectively. The kinematic properties of the selected events are consistent with SM \ttbar production. The inclusive top quark pair production cross-section is measured to be \sigmattbar=145 \pm 31 ^{+42}_{-27} pb where the first uncertainty is statistical and the second systematic. The measurement agrees with perturbative QCD calculations.Comment: 30 pages plus author list (50 pages total), 9 figures, 11 tables, CERN-PH number and final journal adde

    Measurement of the top quark pair cross section with ATLAS in pp collisions at √s=7 TeV using final states with an electron or a muon and a hadronically decaying τ lepton

    Get PDF
    A measurement of the cross section of top quark pair production in proton-proton collisions recorded with the ATLAS detector at the Large Hadron Collider at a centre-of-mass energy of 7 TeV is reported. The data sample used corresponds to an integrated luminosity of 2.05 fb -1. Events with an isolated electron or muon and a τ lepton decaying hadronically are used. In addition, a large missing transverse momentum and two or more energetic jets are required. At least one of the jets must be identified as originating from a b quark. The measured cross section, σtt-=186±13(stat.)±20(syst.)±7(lumi.) pb, is in good agreement with the Standard Model prediction

    Measurement of χ c1 and χ c2 production with s√ = 7 TeV pp collisions at ATLAS

    Get PDF
    The prompt and non-prompt production cross-sections for the χ c1 and χ c2 charmonium states are measured in pp collisions at s√ = 7 TeV with the ATLAS detector at the LHC using 4.5 fb−1 of integrated luminosity. The χ c states are reconstructed through the radiative decay χ c → J/ψγ (with J/ψ → μ + μ −) where photons are reconstructed from γ → e + e − conversions. The production rate of the χ c2 state relative to the χ c1 state is measured for prompt and non-prompt χ c as a function of J/ψ transverse momentum. The prompt χ c cross-sections are combined with existing measurements of prompt J/ψ production to derive the fraction of prompt J/ψ produced in feed-down from χ c decays. The fractions of χ c1 and χ c2 produced in b-hadron decays are also measured

    Measurements of fiducial and differential cross sections for Higgs boson production in the diphoton decay channel at s√=8 TeV with ATLAS

    Get PDF
    Measurements of fiducial and differential cross sections are presented for Higgs boson production in proton-proton collisions at a centre-of-mass energy of s√=8 TeV. The analysis is performed in the H → γγ decay channel using 20.3 fb−1 of data recorded by the ATLAS experiment at the CERN Large Hadron Collider. The signal is extracted using a fit to the diphoton invariant mass spectrum assuming that the width of the resonance is much smaller than the experimental resolution. The signal yields are corrected for the effects of detector inefficiency and resolution. The pp → H → γγ fiducial cross section is measured to be 43.2 ±9.4(stat.) − 2.9 + 3.2 (syst.) ±1.2(lumi)fb for a Higgs boson of mass 125.4GeV decaying to two isolated photons that have transverse momentum greater than 35% and 25% of the diphoton invariant mass and each with absolute pseudorapidity less than 2.37. Four additional fiducial cross sections and two cross-section limits are presented in phase space regions that test the theoretical modelling of different Higgs boson production mechanisms, or are sensitive to physics beyond the Standard Model. Differential cross sections are also presented, as a function of variables related to the diphoton kinematics and the jet activity produced in the Higgs boson events. The observed spectra are statistically limited but broadly in line with the theoretical expectations

    Measurement of the production of a W boson in association with a charm quark in pp collisions at √s = 7 TeV with the ATLAS detector

    Get PDF
    The production of a W boson in association with a single charm quark is studied using 4.6 fb−1 of pp collision data at s√ = 7 TeV collected with the ATLAS detector at the Large Hadron Collider. In events in which a W boson decays to an electron or muon, the charm quark is tagged either by its semileptonic decay to a muon or by the presence of a charmed meson. The integrated and differential cross sections as a function of the pseudorapidity of the lepton from the W-boson decay are measured. Results are compared to the predictions of next-to-leading-order QCD calculations obtained from various parton distribution function parameterisations. The ratio of the strange-to-down sea-quark distributions is determined to be 0.96+0.26−0.30 at Q 2 = 1.9 GeV2, which supports the hypothesis of an SU(3)-symmetric composition of the light-quark sea. Additionally, the cross-section ratio σ(W + +c¯¯)/σ(W − + c) is compared to the predictions obtained using parton distribution function parameterisations with different assumptions about the s−s¯¯¯ quark asymmetry

    Bayesian reassessment of the epigenetic architecture of complex traits

    Get PDF
    Linking epigenetic marks to clinical outcomes improves insight into molecular processes, disease prediction, and therapeutic target identification. Here, a statistical approach is presented to infer the epigenetic architecture of complex disease, determine the variation captured by epigenetic effects, and estimate phenotype-epigenetic probe associations jointly. Implicitly adjusting for probe correlations, data structure (cell-count or relatedness), and single-nucleotide polymorphism (SNP) marker effects, improves association estimates and in 9,448 individuals, 75.7% (95% CI 71.70–79.3) of body mass index (BMI) variation and 45.6% (95% CI 37.3–51.9) of cigarette consumption variation was captured by whole blood methylation array data. Pathway-linked probes of blood cholesterol, lipid transport and sterol metabolism for BMI, and xenobiotic stimuli response for smoking, showed >1.5 times larger associations with >95% posterior inclusion probability. Prediction accuracy improved by 28.7% for BMI and 10.2% for smoking over a LASSO model, with age-, and tissue-specificity, implying associations are a phenotypic consequence rather than causal
    corecore