117,003 research outputs found

    Extending DIRAC File Management with Erasure-Coding for efficient storage

    Get PDF
    The state of the art in Grid style data management is to achieve increased resilience of data via multiple complete replicas of data files across multiple storage endpoints. While this is effective, it is not the most space-efficient approach to resilience, especially when the reliability of individual storage endpoints is sufficiently high that only a few will be inactive at any point in time. We report on work performed as part of GridPP\cite{GridPP}, extending the Dirac File Catalogue and file management interface to allow the placement of erasure-coded files: each file distributed as N identically-sized chunks of data striped across a vector of storage endpoints, encoded such that any M chunks can be lost and the original file can be reconstructed. The tools developed are transparent to the user, and, as well as allowing up and downloading of data to Grid storage, also provide the possibility of parallelising access across all of the distributed chunks at once, improving data transfer and IO performance. We expect this approach to be of most interest to smaller VOs, who have tighter bounds on the storage available to them, but larger (WLCG) VOs may be interested as their total data increases during Run 2. We provide an analysis of the costs and benefits of the approach, along with future development and implementation plans in this area. In general, overheads for multiple file transfers provide the largest issue for competitiveness of this approach at present.Comment: 21st International Conference on Computing for High Energy and Nuclear Physics (CHEP2015

    Replica Selection in the Globus Data Grid

    Get PDF
    The Globus Data Grid architecture provides a scalable infrastructure for the management of storage resources and data that are distributed across Grid environments. These services are designed to support a variety of scientific applications, ranging from high-energy physics to computational genomics, that require access to large amounts of data (terabytes or even petabytes) with varied quality of service requirements. By layering on a set of core services, such as data transport, security, and replica cataloging, one can construct various higher-level services. In this paper, we discuss the design and implementation of a high-level replica selection service that uses information regarding replica location and user preferences to guide selection from among storage replica alternatives. We first present a basic replica selection service design, then show how dynamic information collected using Globus information service capabilities concerning storage system properties can help improve and optimize the selection process. We demonstrate the use of Condor's ClassAds resource description and matchmaking mechanism as an efficient tool for representing and matching storage resource capabilities and policies against application requirements.Comment: 8 pages, 6 figure

    ScotGrid: Providing an Effective Distributed Tier-2 in the LHC Era

    Get PDF
    ScotGrid is a distributed Tier-2 centre in the UK with sites in Durham, Edinburgh and Glasgow. ScotGrid has undergone a huge expansion in hardware in anticipation of the LHC and now provides more than 4MSI2K and 500TB to the LHC VOs. Scaling up to this level of provision has brought many challenges to the Tier-2 and we show in this paper how we have adopted new methods of organising the centres, from fabric management and monitoring to remote management of sites to management and operational procedures, to meet these challenges. We describe how we have coped with different operational models at the sites, where Glagsow and Durham sites are managed "in house" but resources at Edinburgh are managed as a central university resource. This required the adoption of a different fabric management model at Edinburgh and a special engagement with the cluster managers. Challenges arose from the different job models of local and grid submission that required special attention to resolve. We show how ScotGrid has successfully provided an infrastructure for ATLAS and LHCb Monte Carlo production. Special attention has been paid to ensuring that user analysis functions efficiently, which has required optimisation of local storage and networking to cope with the demands of user analysis. Finally, although these Tier-2 resources are pledged to the whole VO, we have established close links with our local physics user communities as being the best way to ensure that the Tier-2 functions effectively as a part of the LHC grid computing framework..Comment: Preprint for 17th International Conference on Computing in High Energy and Nuclear Physics, 7 pages, 1 figur

    dCache, scalable managed storage

    Get PDF
    End of 2007, the most challenging high energy physics experiment ever, the Large Hadron Collider(LHC)[9], at CERN, will start to produce a sustained stream of data in the order of 300MB/sec, equivalent to a stack of CDs as high as the Eiffel Tower once per week. This data is, while produced, distributed and persistently stored at several dozens of sites around the world, building the LHC data grid. The destination sites are expected to provide the necessary middle-ware, so called Storage Elements, offering standard protocols to receive the data and optionally store it at the site specific Tertiary Storage Systems. Beside its actual functionality, discussed subsequently, the Storage Element software has to be able to fit into a large variety of environments. They are known to range from sites providing a single storage box of some Tera Bytes of data and nearly no maintenance personnel up to Tier I sites with estimated disk storage capacities reaching into the Peta Byte area. Moreover, sites expected to store data permanently may want to use their already existing Hierarchical Storage Management (HSM) System to drive the robotics. This requires the Storage Element to be aware of HSM Systems and to be able to manage external file copies. The wide range of scalability, from the very small to the limits of affordable storage, is one of the primary goals of dCache, the Storage Element introduced in this presentation. By being strictly compliant to standard data transfer and control protocols, like gsiFtp, xRootd and the Storage Resource Manager protocol SRM, we are focusing on our second goal which is to make dCache available and useful beyond the borders of the High Energy Physics Community. Beside storing and preparing data for transfer, dCache provides a rich palette of functions to manage the available storage, as will be described subsequently. This includes replication of datasets on automated detection of busy storage components as well as optimization of access to tertiary storage systems

    Exploiting Big Data solutions for CMS computing operations analytics

    Get PDF
    Computing operations at the Large Hadron Collider (LHC) at CERN rely on the Worldwide LHC Computing Grid (WLCG) infrastructure, designed to efficiently allow storage, access, and processing of data at the pre-exascale level. A close and detailed study of the exploited computing systems for the LHC physics mission represents an increasingly crucial aspect in the roadmap of High Energy Physics (HEP) towards the exascale regime. In this context, the Compact Muon Solenoid (CMS) experiment has been collecting and storing over the last few years a large set of heterogeneous non-collision data (e.g. meta-data about replicas placement, transfer operations, and actual user access to physics datasets). All this data richness is currently residing on a distributed Hadoop cluster, and it is organized so that running fast and arbitrary queries using the Spark analytics framework is a viable approach for Big Data mining efforts. Using a data-driven approach oriented to the analysis of this meta-data deriving from several CMS computing services, such as DBS (Data Bookkeeping Service) and MCM (Monte Carlo Management system), we started to focus on data storage and data access over the WLCG infrastructure, and we drafted an embryonal software toolkit to investigate recurrent patterns and provide indicators about physics datasets popularity. As a long-term goal, this aims at contributing to the overall design of a predictive/adaptive system that would eventually reduce costs and complexity of the CMS computing operations, while taking into account the stringent requests by the physics analysts communit

    Hiding the complexity: building a distributed ATLAS Tier-2 with a single resource interface using ARC middleware

    Get PDF
    Since their inception, Grids for high energy physics have found management of data to be the most challenging aspect of operations. This problem has generally been tackled by the experiment's data management framework controlling in fine detail the distribution of data around the grid and the careful brokering of jobs to sites with co-located data. This approach, however, presents experiments with a difficult and complex system to manage as well as introducing a rigidity into the framework which is very far from the original conception of the grid.<p></p> In this paper we describe how the ScotGrid distributed Tier-2, which has sites in Glasgow, Edinburgh and Durham, was presented to ATLAS as a single, unified resource using the ARC middleware stack. In this model the ScotGrid 'data store' is hosted at Glasgow and presented as a single ATLAS storage resource. As jobs are taken from the ATLAS PanDA framework, they are dispatched to the computing cluster with the fastest response time. An ARC compute element at each site then asynchronously stages the data from the data store into a local cache hosted at each site. The job is then launched in the batch system and accesses data locally.<p></p> We discuss the merits of this system compared to other operational models and consider, from the point of view of the resource providers (sites), and from the resource consumers (experiments); and consider issues involved in transitions to this model

    AliEn - EDG Interoperability in ALICE

    Full text link
    AliEn (ALICE Environment) is a GRID-like system for large scale job submission and distributed data management developed and used in the context of ALICE, the CERN LHC heavy-ion experiment. With the aim of exploiting upcoming Grid resources to run AliEn-managed jobs and store the produced data, the problem of AliEn-EDG interoperability was addressed and an in-terface was designed. One or more EDG (European Data Grid) User Interface machines run the AliEn software suite (Cluster Monitor, Storage Element and Computing Element), and act as interface nodes between the systems. An EDG Resource Broker is seen by the AliEn server as a single Computing Element, while the EDG storage is seen by AliEn as a single, large Storage Element; files produced in EDG sites are registered in both the EDG Replica Catalogue and in the AliEn Data Catalogue, thus ensuring accessibility from both worlds. In fact, both registrations are required: the AliEn one is used for the data management, the EDG one to guarantee the integrity and access to EDG produced data. A prototype interface has been successfully deployed using the ALICE AliEn Server and the EDG and DataTAG Testbeds.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003,4 pages, PDF, 2 figures. PSN TUCP00

    Mass production of event simulations for the BaBar experiment using the Grid

    Get PDF
    The BaBar experiment has been taking data since 1999, investigating the violation of charge and parity (CP) symmetry in the field of High Energy Physics. Event simulation is an intensive computing task, due to the complexity of the algorithm based on the Monte Carlo method implemented using the GEANT engine. The simulation input data are stored in ROOT format, they are classified into two categories: conditions data for describing the detector status when data are recorded, and background triggers data for including the noise signal necessary to obtain a realistic simulation. In order to satisfy these requirements, in the traditional BaBar computing model events are distributed over several sites involved in the collaboration where each site manager centrally manages a private farm dedicated to simulation production. The new grid approach applied to the BaBar production framework is discussed along with the schema adopted for data deployment via Xrootd/Scalla servers, including data management using grid middleware on distributed storage facilities spread over the INFN-GRID network. A comparison between the two models is provided, describing also the custom applications developed for performing the whole production task on the grid and showing the results achieved

    Distributed Computing Grid Experiences in CMS

    Get PDF
    The CMS experiment is currently developing a computing system capable of serving, processing and archiving the large number of events that will be generated when the CMS detector starts taking data. During 2004 CMS undertook a large scale data challenge to demonstrate the ability of the CMS computing system to cope with a sustained data-taking rate equivalent to 25% of startup rate. Its goals were: to run CMS event reconstruction at CERN for a sustained period at 25 Hz input rate; to distribute the data to several regional centers; and enable data access at those centers for analysis. Grid middleware was utilized to help complete all aspects of the challenge. To continue to provide scalable access from anywhere in the world to the data, CMS is developing a layer of software that uses Grid tools to gain access to data and resources, and that aims to provide physicists with a user friendly interface for submitting their analysis jobs. This paper describes the data challenge experience with Grid infrastructure and the current development of the CMS analysis system

    CMS Monte Carlo production in the WLCG computing Grid

    Get PDF
    Monte Carlo production in CMS has received a major boost in performance and scale since the past CHEP06 conference. The production system has been re-engineered in order to incorporate the experience gained in running the previous system and to integrate production with the new CMS event data model, data management system and data processing framework. The system is interfaced to the two major computing Grids used by CMS, the LHC Computing Grid (LCG) and the Open Science Grid (OSG). Operational experience and integration aspects of the new CMS Monte Carlo production system is presented together with an analysis of production statistics. The new system automatically handles job submission, resource monitoring, job queuing, job distribution according to the available resources, data merging, registration of data into the data bookkeeping, data location, data transfer and placement systems. Compared to the previous production system automation, reliability and performance have been considerably improved. A more efficient use of computing resources and a better handling of the inherent Grid unreliability have resulted in an increase of production scale by about an order of magnitude, capable of running in parallel at the order of ten thousand jobs and yielding more than two million events per day
    • …
    corecore