9 research outputs found
Recommended from our members
SAM managed cache and processing for clusters in a worldwide grid-enabled system
SAM has been developed within the Computing Division at Fermilab as a versatile, distributed, data management system. One of its many features is its ability to control processing and manage a distributed cache within a cluster of compute servers. Requirements, concepts, and features of this system are described and issues involved in interfacing it to several batch systems are discussed. The system is used within the Dzero experimental collaboration to distribute hundreds of Terabytes of data for processing and analysis around the world. Several hardware configurations deployed at Fermilab are described. Data is currently disseminated using this system to over two dozen sites worldwide, and this number will grow to nearly one hundred in the coming years. The planned design evolution to accommodate this growth is discussed, and the transition of the system to grid standard middleware is described
Hadoop distributed file system for the Grid
Data distribution, storage and access are essential to CPU-intensive and data-intensive high performance Grid computing. A newly emerged file system, Hadoop distributed file system (HDFS), is deployed and tested within the Open Science Grid (OSG) middleware stack. Efforts have been taken to integrate HDFS with other Grid tools to build a complete service framework for the Storage Element (SE). Scalability tests show that sustained high inter-DataNode data transfer can be achieved for the cluster fully loaded with data-processing jobs. The WAN transfer to HDFS supported by BeStMan and tuned GridFTP servers shows large scalability and robustness of the system. The hadoop client can be deployed at interactive machines to support remote data access. The ability to automatically replicate precious data is especially important for computing sites, which is demonstrated at the Large Hadron Collider (LHC) computing centers. The simplicity of operations of HDFS-based SE significantly reduces the cost of ownership of Petabyte scale data storage over alternative solutions
Management of Grid Jobs and Data within SAMGrid
When designing SAMGrid, a project for distributing high-energy physics computations on a grid, we discovered that it was challenging to decide where to place user's jobs. Jobs typically need to access hundreds of files, and each site has a different subset of the files. Our data system SAM knows what portion of a user's data may be at each site, but does not know how to submit grid jobs. Our job submission system Condor-G knows how to submit grid jobs, but originally it required users to choose grid sites and gave them no assistance in choosing. This paper describes how we enhanced Condor-G to interact with SAM to make good decisions about where jobs should be executed, and thereby improve the performance of grid jobs that access large amounts of data. All these enhancements are general enough to be applicable to grid computing beyond the dataintensive computing with SAMGrid
Metrics Correlation and Analysis Service (MCAS)
Abstract. The complexity of Grid workflow activities and their associated software stacks inevitably involves multiple organizations, ownership, and deployment domains. In this setting, important and common tasks such as the correlation and display of metrics and debugging information (fundamental ingredients of troubleshooting) are challenged by the informational entropy inherent to independently maintained and operated software components. Because such an information pool is disorganized, it is a difficult environment for business intelligence analysis i.e. troubleshooting, incident investigation, and trend spotting. The mission of the MCAS project is to deliver a software solution to help with adaptation, retrieval, correlation, and display of workflow-driven data and of type-agnostic events, generated by loosely coupled or fully decoupled middleware
Recommended from our members
Metrics correlation and analysis service (MCAS)
The complexity of Grid workflow activities and their associated software stacks inevitably involves multiple organizations, ownership, and deployment domains. In this setting, important and common tasks such as the correlation and display of metrics and debugging information (fundamental ingredients of troubleshooting) are challenged by the informational entropy inherent to independently maintained and operated software components. Because such an information 'pond' is disorganized, it a difficult environment for business intelligence analysis i.e. troubleshooting, incident investigation and trend spotting. The mission of the MCAS project is to deliver a software solution to help with adaptation, retrieval, correlation, and display of workflow-driven data and of type-agnostic events, generated by disjoint middleware