Search CORE

9 research outputs found

Recommended from our members

SAM managed cache and processing for clusters in a worldwide grid-enabled system

Author: al. Andrew Baranovski et
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 17/07/2002
Field of study

SAM has been developed within the Computing Division at Fermilab as a versatile, distributed, data management system. One of its many features is its ability to control processing and manage a distributed cache within a cluster of compute servers. Requirements, concepts, and features of this system are described and issues involved in interfacing it to several batch systems are discussed. The system is used within the Dzero experimental collaboration to distribute hundreds of Terabytes of data for processing and analysis around the world. Several hardware configurations deployed at Fermilab are described. Data is currently disseminated using this system to over two dozen sites worldwide, and this number will grow to nearly one hundred in the coming years. The planned design evolution to accommodate this growth is discussed, and the transition of the system to grid standard middleware is described

UNT Digital Library

Hadoop distributed file system for the Grid

Author: Attebury Garhan
Baranovski Andrew
Bloom Ken
Bockelman Brian
Kcira Dorian
Letts James
Levshina Tanya
Lundestedt Carl
Maier Will
Martin Terrence
Pi Haifeng
Rana Abhishek
Sfiligoi Igor
Sim Alexander
Thomas Michael
Wuerthwein Frank
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2009
Field of study

Data distribution, storage and access are essential to CPU-intensive and data-intensive high performance Grid computing. A newly emerged file system, Hadoop distributed file system (HDFS), is deployed and tested within the Open Science Grid (OSG) middleware stack. Efforts have been taken to integrate HDFS with other Grid tools to build a complete service framework for the Storage Element (SE). Scalability tests show that sustained high inter-DataNode data transfer can be achieved for the cluster fully loaded with data-processing jobs. The WAN transfer to HDFS supported by BeStMan and tuned GridFTP servers shows large scalability and robustness of the system. The hadoop client can be deployed at interactive machines to support remote data access. The ability to automatically replicate precious data is especially important for computing sites, which is demonstrated at the Large Hadron Collider (LHC) computing centers. The simplicity of operations of HDFS-based SE significantly reduces the cost of ownership of Petabyte scale data storage over alternative solutions

Crossref

Caltech Authors

Management of Grid Jobs and Data within SAMGrid

Author: Alain Roy
Andrew Baranovski
Gabriele Garzoglio
Igor Terekhov
Todd Tannenbaum
Publication venue
Publication date
Field of study

When designing SAMGrid, a project for distributing high-energy physics computations on a grid, we discovered that it was challenging to decide where to place user's jobs. Jobs typically need to access hundreds of files, and each site has a different subset of the files. Our data system SAM knows what portion of a user's data may be at each site, but does not know how to submit grid jobs. Our job submission system Condor-G knows how to submit grid jobs, but originally it required users to choose grid sites and gave them no assistance in choosing. This paper describes how we enhanced Condor-G to interact with SAM to make good decisions about where jobs should be executed, and thereby improve the performance of grid jobs that access large amounts of data. All these enhancements are general enough to be applicable to grid computing beyond the dataintensive computing with SAMGrid

CiteSeerX

Metrics Correlation and Analysis Service (MCAS)

Author: Andrew Baranovski
Dave Dykstra
Gabriele Garzoglio
Parag Mhashilkar
Tanya Levshina
Ted Hesselroth
Publication venue
Publication date: 06/03/2020
Field of study

Abstract. The complexity of Grid workflow activities and their associated software stacks inevitably involves multiple organizations, ownership, and deployment domains. In this setting, important and common tasks such as the correlation and display of metrics and debugging information (fundamental ingredients of troubleshooting) are challenged by the informational entropy inherent to independently maintained and operated software components. Because such an information pool is disorganized, it is a difficult environment for business intelligence analysis i.e. troubleshooting, incident investigation, and trend spotting. The mission of the MCAS project is to deliver a software solution to help with adaptation, retrieval, correlation, and display of workflow-driven data and of type-agnostic events, generated by loosely coupled or fully decoupled middleware

CiteSeerX

Recommended from our members

Metrics correlation and analysis service (MCAS)

Author: /Fermilab
Baranovski Andrew
Dykstra Dave
Garzoglio Gabriele
Hesselroth Ted
Levshina Tanya
Mhashilkar Parag
Publication venue: Fermi National Accelerator Laboratory
Publication date
Field of study

The complexity of Grid workflow activities and their associated software stacks inevitably involves multiple organizations, ownership, and deployment domains. In this setting, important and common tasks such as the correlation and display of metrics and debugging information (fundamental ingredients of troubleshooting) are challenged by the informational entropy inherent to independently maintained and operated software components. Because such an information 'pond' is disorganized, it a difficult environment for business intelligence analysis i.e. troubleshooting, incident investigation and trend spotting. The mission of the MCAS project is to deliver a software solution to help with adaptation, retrieval, correlation, and display of workflow-driven data and of type-agnostic events, generated by disjoint middleware

UNT Digital Library