7 research outputs found

    Analysis Grand Challenge benchmarking tests on selected sites

    Get PDF
    A fast turn-around time and ease of use are important factors for systems supporting the analysis of large HEP data samples. We study and compare multiple technical approaches. This article will be about setting up and benchmarking the Analysis Grand Challenge (AGC) [1] using CMS Open Data. The AGC is an effort to provide a realistic physics analysis with the intent of showcasing the functionality, scalability and feature-completeness of the Scikit-HEP Python ecosystem. We will present the results of setting up the necessary software environment for the AGC and benchmarking the analysis’ run time on various computing clusters: the institute SLURM cluster at LMU Munich, a SLURM cluster at LRZ (WLCG Tier-2 site) and the analysis facility Vispa [2], operated by RWTH Aachen. Each site provides slightly different software environments and modes of operation which poses interesting challenges on the flexibility of a setup like that intended for the AGC. Comparing these benchmarks to each other also provides insights about different storage and caching systems. At LRZ and LMU we have regular Grid storage (HDD) as well as an SSD-based XCache server and on Vispa a sophisticated per-node caching system is used

    Search for direct pair production of the top squark in all-hadronic final states in proton-proton collisions at s√ = 8 TeV with the ATLAS detector

    Get PDF
    The results of a search for direct pair production of the scalar partner to the top quark using an integrated luminosity of 20.1 fb−1 of proton-proton collision data at s√ = 8 TeV recorded with the ATLAS detector at the LHC are reported. The top squark is assumed to decay via t¯ →tχ¯01 or t¯ →bχ¯±1 →bW(∗)χ¯01 , where χ¯01 (χ¯±1) denotes the lightest neutralino (chargino) in supersymmetric models. The search targets a fully-hadronic final state in events with four or more jets and large missing transverse momentum. No significant excess over the Standard Model background prediction is observed, and exclusion limits are reported in terms of the top squark and neutralino masses and as a function of the branching fraction of t¯ →tχ¯01 . For a branching fraction of 100%, top squark masses in the range 270–645 GeV are excluded for χ¯01 masses below 30 GeV. For a branching fraction of 50% to either t¯ →tχ¯01 or t¯ →bχ¯±1 , and assuming the χ¯±1 mass to be twice the χ¯01 mass, top squark masses in the range 250–550 GeV are excluded for χ¯01 masses below 60 GeV

    Columnar data analysis with ATLAS analysis formats

    No full text
    Future analysis of ATLAS data will involve new small-sized analysis formats to cope with the increased storage needs. The smallest of these, named DAOD_PHYSLITE, has calibrations already applied to allow fast downstream analysis and avoid the need for further analysis-specific intermediate formats. This allows for application of the “columnar analysis” paradigm where operations are applied on a per-array instead of a per-event basis. We will present methods to read the data into memory, using Uproot, and also discuss I/O aspects of columnar data and alternatives to the ROOT data format. Furthermore, we will show a representation of the event data model using the Awkward Array package and present proof of concept for a simple analysis application

    Bringing the ATLAS HammerCloud setup to the next level with containerization

    No full text
    HammerCloud (HC) is a testing service and framework for continuous functional tests, on-demand large-scale stress tests, and performance benchmarks. It checks the computing resources and various components of distributed systems with realistic full-chain experiment workflows. The HammerCloud software was initially developed in Python 2. After support for Python 2 was discontinued in 2020, migration to Python 3 became vital in order to fulfill the latest security standards and to use the new CERN Single Sign-On, which requires Python 3. The current deployment setup based on RPMs allowed a stable deployment and secure maintenance over several years of operations for the ATLAS and CMS experiments. However, the current model is not flexible enough to support an agile and rapid development process. Therefore, we have decided to use a containerization solution, and switched to industry-standard technologies and processes. Having an “easy to spawn” instance of HC enables a more agile development cycle and easier deployment. With the help of such a containerized setup, CI/CD pipelines can be integrated into the automation process as an extra layer of verification. A quick onboarding process for new team members and communities is essential, as there is a lot of personnel rotation and a general lack of personpower. This is achieved with the container-based setup, as developers can now work locally with a quick turnaround without needing to set up a production-like environment first. These developments empower the whole community to test and prototype new ideas and deliver new types of resources or workflows to our community

    Performance and impact of dynamic data placement in ATLAS

    Get PDF
    For high-throughput computing the efficient use of distributed computing resources relies on an evenly distributed workload, which in turn requires wide availability of input data that is used in physics analysis. In ATLAS, the dynamic data placement agent C3PO was implemented in the ATLAS distributed data management system Rucio which identifies popular data and creates additional, transient replicas to make data more widely and more reliably available. This proceedings presents studies on the performance of C3PO and the impact it has on throughput rates of distributed computing in ATLAS. Furthermore, results of a study on popularity prediction using machine learning techniques are presented

    Performance and impact of dynamic data placement in ATLAS

    No full text
    For high-throughput computing the efficient use of distributed computing resources relies on an evenly distributed workload, which in turn requires wide availability of input data that is used in physics analysis. In ATLAS, the dynamic data placement agent C3PO was implemented in the ATLAS distributed data management system Rucio which identifies popular data and creates additional, transient replicas to make data more widely and more reliably available. This proceedings presents studies on the performance of C3PO and the impact it has on throughput rates of distributed computing in ATLAS. Furthermore, results of a study on popularity prediction using machine learning techniques are presented
    corecore