38 research outputs found

    The Grid-distributed data analysis in CMS

    Get PDF
    The CMS experiment will soon produce a huge amount of data (a few PBytes per year) that will be distributed and stored in many computing centres spread across the countries participating in the collaboration. Data will be available to the whole CMS physicists: this will be possible thanks to the services provided by supported Grids. CRAB is the CMS collaboration tool developed to allow physicists to access and analyze data stored over world-wide sites. It aims to simplify the data discovery process and the jobs creation, execution and monitoring tasks hiding the details related both to Grid infrastructures and CMS analysis framework. We will discuss the recent evolution of this tool from its standalone version up to the clientserver architecture adopted for particularly challenging workload volumes and we will report the usage statistics collected from the CRAB community, involving so far almost 600 distinct users

    CMS@home: Integrating the Volunteer Cloud and High‑Throughput Computing

    Get PDF
    Volunteer computing has the potential to provide significant additional computing capacity for the LHC experiments. Initiatives such as the CMS@home project are aiming to integrate volunteer computing resources into the experiment’s computational frameworks to support their scientific workloads. This is especially important, as over the next few years the demands on computing capacity will increase beyond what can be supported by general technology trends. This paper describes how a volunteer computing project that uses virtualization to run high energy physics simulations can integrate those resources into their computing infrastructure. The concept of the volunteer cloud is introduced and how this model can simplify the integration is described. An architecture for implementing the volunteer cloud model is presented along with an implementation for the CMS@home project. Finally, the submission of real CMS workloads to this volunteer cloud are compared to identical workloads submitted to the grid

    Use of the gLite-WMS in CMS for production and analysis

    Get PDF
    The CMS experiment at LHC started using the Resource Broker (by the EDG and LCG projects) to submit Monte Carlo production and analysis jobs to distributed computing resources of the WLCG infrastructure over 6 years ago. Since 2006 the gLite Workload Management System (WMS) and Logging \& Bookkeeping (LB) are used. The interaction with the gLite-WMS/LB happens through the CMS production and analysis frameworks, respectively ProdAgent and CRAB, through a common component, BOSSLite. The important improvements recently made in the gLite-WMS/LB as well as in the CMS tools and the intrinsic independence of different WMS/LB instances allow CMS to reach the stability and scalability needed for LHC operations. In particular the use of a multi-threaded approach in BOSSLite allowed to increase the scalability of the systems significantly. In this work we present the operational set up of CMS production and analysis based on the gLite-WMS and the performances obtained in the past data challenges and in the daily Monte Carlo productions and user analysis usage in the experiment

    Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC

    Get PDF

    CHEP2012

    No full text
    The CMS distributed data analysis workflow assumes that jobs run in a different location from where their results are finally stored. Typically the user output must be transferred across the network from one site to another, possibly on a different continent or over links not necessarily validated for high bandwidth/high reliability transfer. This step is named extit{stage-out} and in CMS was originally implemented as a synchronous step of the analysis job execution. However, our experience showed the weakness of this approach both in terms of low total job execution efficiency and failure rates, wasting precious CPU resources. The nature of analysis data makes it inappropriate to use PhEDEx, the core data placement system for CMS. As part of the new generation of CMS Workload Management tools, the Asynchronous Stage-Out system (AsyncStageOut) has been developed to enable third party copy of the user output. The AsyncStageOut component manages glite FTS transfers of data from the temporary store at the site where the job ran to the final location of the data on behalf of that data owner. The tool uses python daemons, built using the WMCore framework, and CouchDB, to manage the queue of work and FTS transfers. CouchDB also provides the platform for a dedicated operations monitoring system. In this paper, we present the motivations of the asynchronous stage-out system. We give an insight into the design and the implementation of key features, describing how it is coupled with the CMS workload management system. Finally, we show the results and the commissioning experience

    CMS users data management service integration and first experiences with its NoSQL data storage

    No full text
    The distributed data analysis workflow in CMS assumes that jobs run in a different location to where their results are finally stored. Typically the user outputs must be transferred from one site to another by a dedicated CMS service, AsyncStageOut. This new service is originally developed to address the inefficiency in using the CMS computing resources when transferring the analysis job outputs, synchronously, once they are produced in the job execution node to the remote site.The AsyncStageOut is designed as a thin application relying only on the NoSQL database (CouchDB) as input and data storage. It has progressed from a limited prototype to a highly adaptable service which manages and monitors the whole user files steps, namely file transfer and publication. The AsyncStageOut is integrated with the Common CMS/Atlas Analysis Framework. It foresees the management of nearly 200k users files per day of close to 1000 individual users per month with minimal delays, and providing a real time monitoring and reports to users and service operators, while being highly available. The associated data volume represents a new set of challenges in the areas of database scalability and service performance and efficiency. In this paper, we present an overview of the AsyncStageOut model and the integration strategy with the Common Analysis Framework. The motivations for using the NoSQL technology are also presented, as well as data design and the techniques used for efficient indexing and monitoring of the data. We describe deployment model for the high availability and scalability of the service. We also discuss the hardware requirements and the results achieved as they were determined by testing with actual data and realistic loads during the commissioning and the initial production phase with the Common Analysis Framework

    CHEP2012

    No full text
    In CMS Computing the highest priorities for analysis tools are the improvement of the end users' ability to produce and publish reliable samples and analysis results as well as a transition to a sustainable development and operations model. To achieve these goals CMS decided to incorporate analysis processing into the same framework as data and simulation processing. This strategy foresees that all workload tools (Tier0, Tier1, production, analysis) share a common core with long term maintainability as well as the standardization of the operator interfaces. The re-engineered analysis workload manager, called CRAB3, makes use of newer technologies, such as RESTFul based web services and NoSQL Databases, aiming to increase the scalability and reliability of the system. As opposed to CRAB2, in CRAB3 all work is centrally injected and managed in a global queue. A pool of agents, which can be geographically distributed, consumes work from the central services serving the user tasks. The new architecture of CRAB substantially changes the deployment model and operations activities. In this paper we present the implementation of CRAB3, emphasizing how the new architecture improves the workflow automation and simplifies maintainability. In particular, we will highlight the impact of the new design on daily operations

    A comparison of different database technologies for the CMS AsyncStageOut transfer database

    No full text
    AsyncStageOut (ASO) is the component of the CMS distributed data analysis system (CRAB) that manages users transfers in a centrally controlled way using the File Transfer System (FTS3) at CERN. It addresses a major weakness of the previous, decentralized model, namely that the transfer of the user’s output data to a single remote site was part of the job execution, resulting in inefficient use of job slots and an unacceptable failure rate. Currently ASO manages up to 600k files of various sizes per day from more than 500 users per month, spread over more than 100 sites. ASO uses a NoSQL database (CouchDB) as internal bookkeeping and as way to communicate with other CRAB components. Since ASO/CRAB were put in production in 2014, the number of transfers constantly increased up to a point where the pressure to the central CouchDB instance became critical, creating new challenges for the system scalability, performance, and monitoring. This forced a re-engineering of the ASO application to increase its scalability and lowering its operational effort. In this contribution we present a comparison of the performance of the current NoSQL implementation and a new SQL implementation, and how their different strengths and features influenced the design choices and operational experience. We also discuss other architectural changes introduced in the system to handle the increasing load and latency in delivering output to the user
    corecore