400 research outputs found

    An Introspective Comparison of Random Forest-Based Classifiers for the Analysis of Cluster-Correlated Data by Way of RF++

    Get PDF
    Many mass spectrometry-based studies, as well as other biological experiments produce cluster-correlated data. Failure to account for correlation among observations may result in a classification algorithm overfitting the training data and producing overoptimistic estimated error rates and may make subsequent classifications unreliable. Current common practice for dealing with replicated data is to average each subject replicate sample set, reducing the dataset size and incurring loss of information. In this manuscript we compare three approaches to dealing with cluster-correlated data: unmodified Breiman's Random Forest (URF), forest grown using subject-level averages (SLA), and RF++ with subject-level bootstrapping (SLB). RF++, a novel Random Forest-based algorithm implemented in C++, handles cluster-correlated data through a modification of the original resampling algorithm and accommodates subject-level classification. Subject-level bootstrapping is an alternative sampling method that obviates the need to average or otherwise reduce each set of replicates to a single independent sample. Our experiments show nearly identical median classification and variable selection accuracy for SLB forests and URF forests when applied to both simulated and real datasets. However, the run-time estimated error rate was severely underestimated for URF forests. Predictably, SLA forests were found to be more severely affected by the reduction in sample size which led to poorer classification and variable selection accuracy. Perhaps most importantly our results suggest that it is reasonable to utilize URF for the analysis of cluster-correlated data. Two caveats should be noted: first, correct classification error rates must be obtained using a separate test dataset, and second, an additional post-processing step is required to obtain subject-level classifications. RF++ is shown to be an effective alternative for classifying both clustered and non-clustered data. Source code and stand-alone compiled versions of command-line and easy-to-use graphical user interface (GUI) versions of RF++ for Windows and Linux as well as a user manual (Supplementary File S2) are available for download at: http://sourceforge.org/projects/rfpp/ under the GNU public license

    Fine-Scale Mapping of the 4q24 Locus Identifies Two Independent Loci Associated with Breast Cancer Risk

    Get PDF
    Background: A recent association study identified a common variant (rs9790517) at 4q24 to be associated with breast cancer risk. Independent association signals and potential functional variants in this locus have not been explored. Methods: We conducted a fine-mapping analysis in 55,540 breast cancer cases and 51,168 controls from the Breast Cancer Association Consortium. Results: Conditional analyses identified two independent association signals among women of European ancestry, represented by rs9790517 [conditional P = 2.51 × 10−4; OR, 1.04; 95% confidence interval (CI), 1.02–1.07] and rs77928427 (P = 1.86 × 10−4; OR, 1.04; 95% CI, 1.02–1.07). Functional annotation using data from the Encyclopedia of DNA Elements (ENCODE) project revealed two putative functional variants, rs62331150 and rs73838678 in linkage disequilibrium (LD) with rs9790517 (r2 ≥ 0.90) residing in the active promoter or enhancer, respectively, of the nearest gene, TET2. Both variants are located in DNase I hypersensitivity and transcription factor–binding sites. Using data from both The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), we showed that rs62331150 was associated with level of expression of TET2 in breast normal and tumor tissue. Conclusion: Our study identified two independent association signals at 4q24 in relation to breast cancer risk and suggested that observed association in this locus may be mediated through the regulation of TET2. Impact: Fine-mapping study with large sample size warranted for identification of independent loci for breast cancer risk

    Synthetic Biology: Mapping the Scientific Landscape

    Get PDF
    This article uses data from Thomson Reuters Web of Science to map and analyse the scientific landscape for synthetic biology. The article draws on recent advances in data visualisation and analytics with the aim of informing upcoming international policy debates on the governance of synthetic biology by the Subsidiary Body on Scientific, Technical and Technological Advice (SBSTTA) of the United Nations Convention on Biological Diversity. We use mapping techniques to identify how synthetic biology can best be understood and the range of institutions, researchers and funding agencies involved. Debates under the Convention are likely to focus on a possible moratorium on the field release of synthetic organisms, cells or genomes. Based on the empirical evidence we propose that guidance could be provided to funding agencies to respect the letter and spirit of the Convention on Biological Diversity in making research investments. Building on the recommendations of the United States Presidential Commission for the Study of Bioethical Issues we demonstrate that it is possible to promote independent and transparent monitoring of developments in synthetic biology using modern information tools. In particular, public and policy understanding and engagement with synthetic biology can be enhanced through the use of online interactive tools. As a step forward in this process we make existing data on the scientific literature on synthetic biology available in an online interactive workbook so that researchers, policy makers and civil society can explore the data and draw conclusions for themselves

    Studies of new Higgs boson interactions through nonresonant HH production in the b¯bγγ fnal state in pp collisions at √s = 13 TeV with the ATLAS detector

    Get PDF
    A search for nonresonant Higgs boson pair production in the b ¯bγγ fnal state is performed using 140 fb−1 of proton-proton collisions at a centre-of-mass energy of 13 TeV recorded by the ATLAS detector at the CERN Large Hadron Collider. This analysis supersedes and expands upon the previous nonresonant ATLAS results in this fnal state based on the same data sample. The analysis strategy is optimised to probe anomalous values not only of the Higgs (H) boson self-coupling modifer κλ but also of the quartic HHV V (V = W, Z) coupling modifer κ2V . No signifcant excess above the expected background from Standard Model processes is observed. An observed upper limit µHH < 4.0 is set at 95% confdence level on the Higgs boson pair production cross-section normalised to its Standard Model prediction. The 95% confdence intervals for the coupling modifers are −1.4 < κλ < 6.9 and −0.5 < κ2V < 2.7, assuming all other Higgs boson couplings except the one under study are fxed to the Standard Model predictions. The results are interpreted in the Standard Model efective feld theory and Higgs efective feld theory frameworks in terms of constraints on the couplings of anomalous Higgs boson (self-)interactions

    Measurement of the H → γ γ and H → ZZ∗ → 4 cross-sections in pp collisions at √s = 13.6 TeV with the ATLAS detector

    Get PDF
    The inclusive Higgs boson production cross section is measured in the di-photon and the Z Z∗ → 4 decay channels using 31.4 and 29.0 fb−1 of pp collision data respectively, collected with the ATLAS detector at a centre of-mass energy of √s = 13.6 TeV. To reduce the model dependence, the measurement in each channel is restricted to a particle-level phase space that closely matches the chan nel’s detector-level kinematic selection, and it is corrected for detector effects. These measured fiducial cross-sections are σfid,γ γ = 76+14 −13 fb, and σfid,4 = 2.80 ± 0.74 fb, in agreement with the corresponding Standard Model predic tions of 67.6±3.7 fb and 3.67±0.19 fb. Assuming Standard Model acceptances and branching fractions for the two chan nels, the fiducial measurements are extrapolated to the full phase space yielding total cross-sections of σ (pp → H) = 67+12 −11 pb and 46±12 pb at 13.6 TeV from the di-photon and Z Z∗ → 4 measurements respectively. The two measure ments are combined into a total cross-section measurement of σ (pp → H) = 58.2±8.7 pb, to be compared with the Stan dard Model prediction of σ (pp → H)SM = 59.9 ± 2.6 p

    Search for pairs of muons with small displacements in pp collisions at root s=13 TeV with the ATLAS detector

    Get PDF
    A search for new phenomena giving rise to pairs of opposite electrically charged muons with impact parameters in the millimeter range is presented, using 139 fb−1 of √s = 13 TeV pp collision data from the ATLAS detector at the LHC. The search targets the gap in coverage between existing searches targeting final states with leptons with large displacement and prompt leptons. No significant excess over the background expectation is observed and exclusion limits are set on the mass of long-lived scalar supersymmetric muon-partners (smuons) with much lower lifetimes than previously targeted by displaced muon searches. Smuon lifetimes down to 1 ps are excluded for a smuon mass of 100 GeV, and smuon masses up to 520 GeV are excluded for a proper lifetime of 10 ps, at 95% confidence level. Finally, model-independent limits are set on the contribution from new phenomena to the signal-region yield

    Comparison of inclusive and photon-tagged jet suppression in 5.02 TeV Pb+Pb collisions with ATLAS

    Get PDF
    Parton energy loss in the quark–gluon plasma (QGP) is studied with a measurement of photon-tagged jet production in 1.7 nb−1 of Pb+Pb data and 260 pb−1 of pp data, both at √sNN = 5.02 TeV, with the ATLAS detector. The process pp → γ +jet+X and its analogue in Pb+Pb collisions is measured in events containing an isolated photon with transverse momentum (pT) above 50 GeV and reported as a function of jet pT. This selection results in a sample of jets with a steeply falling pT distribution that are mostly initiated by the showering of quarks. The pp and Pb+Pb measurements are used to report the nuclear modification factor, RAA, and the fractional energy loss, Sloss, for photon-tagged jets. In addition, the results are compared with the analogous ones for inclusive jets, which have a significantly smaller quark-initiated fraction. The RAA and Sloss values are found to be significantly different between those for photon-tagged jets and inclusive jets, demonstrating that energy loss in the QGP is sensitive to the colour-charge of the initiating parton. The results are also compared with a variety of theoretical models of colour-charge-dependent energy loss

    Measurement of the cross-sections of the electroweak and total production of a Zγ pair in association with two jets in pp collisions at root s=13 TeV with the ATLAS detector

    Get PDF
    corecore