30 research outputs found

    Object level physics data replication in the Grid

    Get PDF
    To support distributed physics analysis on a scale as foreseen by the LHC experiments, 'Grid' systems are needed that manage and streamline data distribution, replication, and synchronization. We report on the development of a tool that allows large physics datasets to be managed and replicated at the granularity level of single objects. Efficient and convenient support for data extraction and replication at the level of individual objects and events will enable for types of interactive data analysis that would be too inconvenient or costly to perform with tools that work on a file level only. Our tool development effort is intended as both a demonstrator project for various types of existing Grid technology, and as a research effort to develop Grid technology further. The basic use case supported by our tool is one in which a physicist repeatedly selects some physics objects located at a central repository, and replicates them to a local site. The selection can be done using 'tag' or 'ntuple' analysis at the local site. The tool replicates the selected objects, and merges all replicated objects into a single single coherent 'virtual' dataset. This allows all objects to be used together seamlessly, even if they were replicated at different times or from different locations. The version of the tool that is reported on in this paper replicates ORCA based physics data created by CMS in its ongoing high level trigger design studies. The basic capabilities and limitations of the tool are discussed, together with some performance results. Some tool internals are also presented. Finally we will report on experiences so far and on future plans

    Object Database Scalability for Scientific Workloads

    Get PDF
    We describe the PetaByte-scale computing challenges posed by the next generation of particle physics experiments, due to start operation in 2005. The computing models adopted by the experiments call for systems capable of handling sustained data acquisition rates of at least 100 MBytes/second into an Object Database, which will have to handle several PetaBytes of accumulated data per year. The systems will be used to schedule CPU intensive reconstruction and analysis tasks on the highly complex physics Object data which need then be served to clients located at universities and laboratories worldwide. We report on measurements with a prototype system that makes use of a 256 CPU HP Exemplar X Class machine running the Objectivity/DB database. Our results show excellent scalability for up to 240 simultaneous database clients, and aggregate I/O rates exceeding 150 Mbytes/second, indicating the viability of the computing models

    Views of CMS Event Data: Objects, Files, Collections, Virtual Data Products

    No full text
    The CMS data grid system will store many types of data maintained by the CMS collaboration. An important type of data is the event data, which is defined in this note as all data that directly represents simulated, raw, or reconstructed CMS physics events. Many views on this data will exist simultaneously. To a CMS physics code implementer this data will appear as C++ objects, to a tape robot operator the data will appear as files. This note identifies different views that can exist, describes each of them, and interrelates them by placing them into a vertical stack. This particular stack integrates several existing architectural structures, and is therefore a plausible basis for further prototyping and architectural work. This document is intended as a contribution to, and as common (terminological) reference material for, the CMS architectural efforts and for the Grid projects PPDG, GriPhyN, and the EU DataGrid

    Prototyping of CMS Storage Management

    No full text
    We report on a nine-month prototyping project concerning storage management in the CMS offline system. The work focused on the issue of using large disk farms efficiently. We discuss various hard disk performance characteristics which are important for physics analysis applications. It is shown that the layout of physics data on disk ( clustering) has a significant impact on performance. We develop a storage management architecture which ensures high disk performance under a typical physics analysis workload

    HEPGRID2001: A Model of a Virtual Data Grid Application

    No full text
    . Future high energy physics experiments will require huge distributed computational infrastructures, called data grids, to satisfy their data processing and analysis needs. This paper records the current understanding of the demands that will be put on a data grid around 2006, by the hundreds of physicists working with data from the CMS experiment. The current understanding is recorded by dening a model of this CMS physics analysis application running on a `virtual data grid' as proposed by the GriPhyN project. The complete model consists of a hardware model, a data model, and an application workload model. The main utility of the HEPGRID2001 model is that it encodes high energy physics (HEP) application domain knowledge and makes it available in a form that is understandable for the CS community, so that architectural and performance requirements for data grid middleware components can be derived. c Springer-Verlag. To be published in Proc. of HPCN Europe 2001.

    CMS Data Grid System - Overview and Requirements

    No full text
    This document gives a comprehensive overview of the data grid system that CMS intends to operate around December 2003. This CMS data grid system will support both CMS production and analysis and will seamlessly tie together resources at CERN and at international CMS regional centers. This document focuses on the relation between the CMS software components and the grid software components operating inside the 2003 CMS data grid system. In addition, the document includes overview and reference material to introduce the members of the grid projects (GriPhyN, PPDG, and the EU DataGrid) to the CMS data handling environment. This document contains a snapshot, taken in 2001, of the vision that CMS has of the intended software capabilities of its production data grid system in 2003, and the expected scaling towards 2007. To capture the expected level of complexity, the vision is sometimes worked out to considerable detail, even though some of these details are likely to be adjusted in future. Though the vision captured in this document will likely evolve, this document does yield the current requirements for the grid projects that CMS is involved in as a `customer', the requirements for the grid components which are to be created from now until the end of 2003. The major CMS software milestones affecting the grid projects are the `delivery of baseline core software' milestone for December 2002, this includes the choice and integration of grid components into the baseline software, and the `20% data challenge' milestone, for which the work starts in January 2004 with milestone completion in December 2004. The 20% data challenge includes the full range of distributed operations required for the analysis of CMS data under realistic conditions as occurring during..
    corecore