28 research outputs found
Recommended from our members
Run II data analysis on the grid
In this document, we begin the technical design for the distributed RunII computing for CDF and D0. The present paper defines the three components of the data handling area of Run II computing, namely the Data Handling System, the Storage System and the Application. We outline their functionality and interaction between them. We identify necessary and desirable elements of the interfaces
Recommended from our members
Distributed processing and analysis of physics data in the D0 SAM system at Fermilab
SAM (Sequential Access through Meta-data) is the data access system for the D0 high energy physics (HEP) experiment at Fermilab. The system is being developed and used to handle the Petabyte-scale experiment data. The D0 applications, like virtually all HEP applications, are data-intensive, which poses special problems for the data management and job control facilities in the distributed environment. The fundamental problem is to bring the user applications and the data together, and SAM attacks the problems from both sides. First, we describe how the system moves the data through the distributed disk cache. Second, we describe how SAM interacts with the batch system to synchronize parallel user jobs with the data availability. All the design solutions herein have been implemented in a real system that handles the mission-critical data of the D0 experiment; thus, we present our work from the standpoint of real experience
Recommended from our members
Distributed data access in the sequential access model at the D0 experiment at Fermilab
The authors present the Sequential Access Model (SAM), which is the data handling system for D0, one of two primary High Energy Experiments at Fermilab. During the next several years, the D0 experiment will store a total of about 1 PByte of data, including raw detector data and data processed at various levels. The design of SAM is not specific to the D0 experiment and carries few assumptions about the underlying mass storage level; its ideas are applicable to any sequential data access. By definition, in the sequential access mode a user application needs to process a stream of data, by accessing each data unit exactly once, the order of data units in the stream being irrelevant. The units of data are laid out sequentially in files. The adopted model allows for significant optimizations of system performance, decrease of user file latency and increase of overall throughput. In particular, caching is done with the knowledge of all the files needed in the near future, defined as all the files of the already running or submitted jobs. The bulk of the data is stored in files on tape in the mass storage system (MSS) called Enstore[2] and also developed at Fermilab. (The tape drives are served by an ADIC AML/2 Automated Tape Library). At any given time, SAM has a small fraction of the data cached on disk for processing. In the present paper, the authors discuss how data is delivered onto disk and how it is accessed by user applications. They will concentrate on data retrieval (consumption) from the MSS; when SAM is used for storing of data, the mechanisms are rather symmetrical. All of the data managed by SAM is cataloged in great detail in a relational database (ORACLE). The database also serves as the persistency mechanism for the SAM servers described in this paper. Any client or server in the SAM system which needs to store or retrieve information from the database does so through the interfaces of a CORBA-based database server. The users (physicists) use the database to define, based on physics selection criteria, datasets of their interest. Once the query is defined and resolved into a set of files, actual data processing, called a project, may begin. Obviously, running projects involves data transfer and resource management. The computing facilities with their CPU, disk, and other hardware resources are logically partitioned into collections of resources called stations. A station may be a single node, a fraction thereof (some of the machine's disks and/or CPUs may constitute a station) or a collection of smaller nodes. It is equipped with a server, called station master (SM), that coordinates data delivery and projects using the data. User requests to actually run a project proceed through the SM, which determines the amount of cache replacement, if any, needed to run the project. If viable, the user job is submitted into a station-associated batch queue, otherwise the project is rejected and the user may try another station
Recommended from our members
Uniformity on the grid via a configuration framework
As Grid permeates modern computing, Grid solutions continue to emerge and take shape. The actual Grid development projects continue to provide higher-level services that evolve in functionality and operate with application-level concepts which are often specific to the virtual organizations that use them. Physically, however, grids are comprised of sites whose resources are diverse and seldom project readily onto a grid's set of concepts. In practice, this also creates problems for site administrators who actually instantiate grid services. In this paper, we present a flexible, uniform framework to configure a grid site and its facilities, and otherwise describe the resources and services it offers. We start from a site configuration and instantiate services for resource advertisement, monitoring and data handling; we also apply our framework to hosting environment creation. We use our ideas in the Information Management part of the SAM-Grid project, a grid system which will deliver petabyte-scale data to the hundreds of users. Our users are High Energy Physics experimenters who are scattered worldwide across dozens of institutions and always use facilities that are shared with other experiments as well as other grids. Our implementation represents information in the XML format and includes tools written in XQuery and XSLT
Recommended from our members
Distributed data access and resource management in the D0 SAM system
SAM (Sequential Access through Meta-data) is the data access and job management system for the D0 high energy physics experiment at Fermilab. The SAM system is being developed and used to handle the Petabyte-scale experiment data, accessed by hundreds of D0 collaborators scattered around the world. In this paper, we present solutions to some of the distributed data processing problems from the perspective of real experience dealing with mission-critical data. We concentrate on the distributed disk caching, resource management and job control. The system has elements of the Grid Computing and has features applicable to data-intensive computing in general
Management of Grid Jobs and Data within SAMGrid
When designing SAMGrid, a project for distributing high-energy physics computations on a grid, we discovered that it was challenging to decide where to place user's jobs. Jobs typically need to access hundreds of files, and each site has a different subset of the files. Our data system SAM knows what portion of a user's data may be at each site, but does not know how to submit grid jobs. Our job submission system Condor-G knows how to submit grid jobs, but originally it required users to choose grid sites and gave them no assistance in choosing. This paper describes how we enhanced Condor-G to interact with SAM to make good decisions about where jobs should be executed, and thereby improve the performance of grid jobs that access large amounts of data. All these enhancements are general enough to be applicable to grid computing beyond the dataintensive computing with SAMGrid