9 research outputs found

    The ATLAS EventIndex: Full chain deployment and first operation

    Get PDF
    AbstractThe Event Index project consists in the development and deployment of a complete catalogue of events for experiments with large amounts of data, such as the ATLAS experiment at the LHC accelerator at CERN. Data to be stored in the EventIndex are produced by all production jobs that run at CERN or the GRID; for every permanent output file, a snippet of information, containing the file unique identifier and the relevant attributes for each event, is sent to the central catalogue. The estimated insertion rate during the LHC Run 2 is about 80 Hz of file records containing ∌15 kHz of event records. This contribution describes the system design, the initial performance tests of the full data collection and cataloguing chain, and the project evolution towards the full deployment and operation by the end of 2014

    QuerySpaces on Hadoop for the ATLAS EventIndex

    No full text
    International audienceThe new ATLAS EventIndex catalogue uses a Hadoop cluster to store information on each event processed by ATLAS. Several tools belonging to the Hadoop eco-system are used to organise the data in HDFS, catalogue it internally, and provide the search functionality. This presentation will describe the Hadoop-based implementation of the adaptive query engine serving as the back-end for the ATLAS EventIndex. The QuerySpaces implementation handles both original data and search results providing fast and efficient mechanisms for new user queries using already accumulated knowledge for optimisation. Detailed description and statistics about user requests are collected in HBase tables and HDFS files. Requests are associated to their results and a graph of relations between them is created to be used to find the most efficient way of providing answers to new requests. The environment is completely transparent to users and is accessible over several command-line interfaces, a Web Service and a programming API

    ATLAS EventIndex general dataflow and monitoring infrastructure

    No full text
    International audienceThe ATLAS EventIndex has been running in production since mid-2015, reliably collecting information worldwide about all produced events and storing them in a central Hadoop infrastructure at CERN. A subset of this information is copied to an Oracle relational database for fast dataset discovery, event-picking, crosschecks with other ATLAS systems and checks for event duplication. The system design and its optimization is serving event picking from requests of a few events up to scales of tens of thousand of events, and in addition, data consistency checks are performed for large production campaigns. Detecting duplicate events with a scope of physics collections has recently arisen as an important use case. This paper describes the general architecture of the project and the data flow and operation issues, which are addressed by recent developments to improve the throughput of the overall system. In this direction, the data collection system is reducing the usage of the messaging infrastructure to overcome the performance shortcomings detected during production peaks; an object storage approach is instead used to convey the event index information, and messages to signal their location and status. Recent changes in the Producer/Consumer architecture are also presented in detail, as well as the monitoring infrastructure

    The ATLAS EventIndex: Full chain deployment and first operation

    No full text
    Proceedings of the 37 International Conference on High Energy Physics (ICHEP 2014) will be published in Nuclear Physics B - Proceedings SupplementsThe Event Index project consists in the development and deployment of a complete catalogue of events for experiments with large amounts of data, such as the ATLAS experiment at the LHC accelerator at CERN. Data to be stored in the EventIndex are produced by all production jobs that run at CERN or the GRID; for every permanent output file a snippet of information, containing the file unique identifier and for each event the relevant attributes is sent to the central catalogue. The estimated insertion rate during LHC Run2 is about 80 Hz of file records containing ~15 kHz of event records. This contribution describes the system design, the initial performance tests of the full data collection and cataloguing chain, and the project evolution towards the full deployment and operation by the end of 2014

    The CMS CERN Analysis Facility (CAF)

    No full text
    The CMS CERN Analysis Facility (CAF) was primarily designed to host a large variety of latency-critical workflows. These break down into alignment and calibration, detector commissioning and diagnosis, and high-interest physics analysis requiring fast-turnaround. In addition to the low latency requirement on the batch farm, another mandatory condition is the efficient access to the RAW detector data stored at the CERN Tier-0 facility. The CMS CAF also foresees resources for interactive login by a large number of CMS collaborators located at CERN, as an entry point for their day-by-day analysis. These resources will run on a separate partition in order to protect the high-priority use-cases described above. While the CMS CAF represents only a modest fraction of the overall CMS resources on the WLCG GRID, an appropriately sized user-support service needs to be provided. We will describe the building, commissioning and operation of the CMS CAF during the year 2008. The facility was heavily and routinely used by almost 250 users during multiple commissioning and data challenge periods. It reached a CPU capacity of 1.4MSI2K and a disk capacity at the Peta byte scale. In particular, we will focus on the performances in terms of networking, disk access and job efficiency and extrapolate prospects for the upcoming LHC first year data taking. We will also present the experience gained and the limitations observed in operating such a large facility, in which well controlled workflows are combined with more chaotic type analysis by a large number of physicists
    corecore