35 research outputs found
Recommended from our members
A Globally Distributed System for Job, Data, and Information Handling for High Energy Physics
The computing infrastructures of the modern high energy physics experiments need to address an unprecedented set of requirements. The collaborations consist of hundreds of members from dozens of institutions around the world and the computing power necessary to analyze the data produced surpasses already the capabilities of any single computing center. A software infrastructure capable of seamlessly integrating dozens of computing centers around the world, enabling computing for a large and dynamical group of users, is of fundamental importance for the production of scientific results. Such a computing infrastructure is called a computational grid. The SAM-Grid offers a solution to these problems for CDF and DZero, two of the largest high energy physics experiments in the world, running at Fermilab. The SAM-Grid integrates standard grid middleware, such as Condor-G and the Globus Toolkit, with software developed at Fermilab, organizing the system in three major components: data handling, job handling, and information management. This dissertation presents the challenges and the solutions provided in such a computing infrastructure
ReSS: A Resource Selection Service for the Open Science Grid
The Open Science Grid offers access to hundreds of computing and storage resources via standard Grid interfaces. Before the deployment of an automated resource selection system, users had to submit jobs directly to these resources. They would manually select a resource and specify all relevant attributes in the job description prior to submitting the job. The necessity of a human intervention in resource selection and attribute specification hinders automated job management components from accessing OSG resources and it is inconvenient for the users. The Resource Selection Service (ReSS) project addresses these shortcomings. The system integrates condor technology, for the core match making service, with the gLite CEMon component, for gathering and publishing resource information in the Glue Schema format. Each one of these components communicates over secure protocols via web services interfaces. The system is currently used in production on OSG by the DZero Experiment, the Engagement Virtual Organization, and the Dark Energy. It is also the resource selection service for the Fermilab Campus Grid, FermiGrid. ReSS is considered a lightweight solution to push-based workload management. This paper describes the architecture, performance, and typical usage of the system
Recommended from our members
ReSS: Resource Selection Service for National and Campus Grid Infrastructure
The Open Science Grid (OSG) offers access to around hundred Compute elements (CE) and storage elements (SE) via standard Grid interfaces. The Resource Selection Service (ReSS) is a push-based workload management system that is integrated with the OSG information systems and resources. ReSS integrates standard Grid tools such as Condor, as a brokering service and the gLite CEMon, for gathering and publishing resource information in GLUE Schema format. ReSS is used in OSG by Virtual Organizations (VO) such as Dark Energy Survey (DES), DZero and Engagement VO. ReSS is also used as a Resource Selection Service for Campus Grids, such as FermiGrid. VOs use ReSS to automate the resource selection in their workload management system to run jobs over the grid. In the past year, the system has been enhanced to enable publication and selection of storage resources and of any special software or software libraries (like MPI libraries) installed at computing resources. In this paper, we discuss the Resource Selection Service, its typical usage on the two scales of a National Cyber Infrastructure Grid, such as OSG, and of a campus Grid, such as FermiGrid
Recommended from our members
SVOPME: A scalable virtual organization privileges management environment
Grids enable uniform access to resources by implementing standard interfaces to resource gateways. In the Open Science Grid (OSG), privileges are granted on the basis of the user's membership to a Virtual Organization (VO). However, Grid sites are solely responsible to determine and control access privileges to resources using users identity and personal attributes, which are available through Grid credentials. While this guarantees full control on access rights to the sites, it makes VO privileges heterogeneous throughout the Grid and hardly fits with the Grid paradigm of uniform access to resources. To address these challenges, we are developing the Scalable Virtual Organization Privileges Management Environment (SVOPME), which provides tools for VOs to define and publish desired privileges and assists sites to provide the appropriate access policies. Moreover, SVOPME provides tools for Grid sites to analyze site access policies for various resources, verify compliance with preferred VO policies, and generate directives for site administrators on how the local access policies can be amended to achieve such compliance without taking control of local configurations away from site administrators. This paper discusses what access policies are of interest to the OSG community and how SVOPME implements privilege management for OSG
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation
Historically, high energy physics computing has been performed on large
purpose-built computing systems. These began as single-site compute facilities,
but have evolved into the distributed computing grids used today. Recently,
there has been an exponential increase in the capacity and capability of
commercial clouds. Cloud resources are highly virtualized and intended to be
able to be flexibly deployed for a variety of computing tasks. There is a
growing nterest among the cloud providers to demonstrate the capability to
perform large-scale scientific computing. In this paper, we discuss results
from the CMS experiment using the Fermilab HEPCloud facility, which utilized
both local Fermilab resources and virtual machines in the Amazon Web Services
Elastic Compute Cloud. We discuss the planning, technical challenges, and
lessons learned involved in performing physics workflows on a large-scale set
of virtualized resources. In addition, we will discuss the economics and
operational efficiencies when executing workflows both in the cloud and on
dedicated resources.Comment: 15 pages, 9 figure
Investigation of Storage Options for Scientific Computing on Grid and Cloud Facilities
Abstract. In recent years, several new storage technologies, such as Lustre, Hadoop, OrangeFS, and BlueArc, have emerged. While several groups have run benchmarks to characterize them under a variety of configurations, more work is needed to evaluate these technologies for the use cases of scientific computing on Grid clusters and Cloud facilities. This paper discusses our evaluation of the technologies as deployed on a test bed at FermiCloud, one of the Fermilab infrastructure-as-a-service Cloud facilities. The test bed consists of 4 server-class nodes with 40 TB of disk space and up to 50 virtual machine clients, some running on the storage server nodes themselves. With this configuration, the evaluation compares the performance of some of these technologies when deployed on virtual machines and on "bare metal" nodes. In addition to running standard benchmarks such as IOZone to check the sanity of our installation, we have run I/O intensive tests using physics-analysis applications. This paper presents how the storage solutions perform in a variety of realistic use cases of scientific computing. One interesting difference among the storage systems tested is found in a decrease in total read throughput with increasing number of client processes, which occurs in some implementations but not others
Recommended from our members
A code inspection process for security reviews
In recent years, it has become more and more evident that software threat communities are taking an increasing interest in Grid infrastructures. To mitigate the security risk associated with the increased numbers of attacks, the Grid software development community needs to scale up effort to reduce software vulnerabilities. This can be achieved by introducing security review processes as a standard project management practice. The Grid Facilities Department of the Fermilab Computing Division has developed a code inspection process, tailored to reviewing security properties of software. The goal of the process is to identify technical risks associated with an application and their impact. This is achieved by focusing on the business needs of the application (what it does and protects), on understanding threats and exploit communities (what an exploiter gains), and on uncovering potential vulnerabilities (what defects can be exploited). The desired outcome of the process is an improvement of the quality of the software artifact and an enhanced understanding of possible mitigation strategies for residual risks. This paper describes the inspection process and lessons learned on applying it to Grid middleware
Investigation of Storage Options for Scientific Computing on Grid and Cloud Facilities
Abstract. In recent years, several new storage technologies, such as Lustre, Hadoop, OrangeFS, and BlueArc, have emerged. While several groups have run benchmarks to characterize them under a variety of configurations, more work is needed to evaluate these technologies for the use cases of scientific computing on Grid clusters and Cloud facilities. This paper discusses our evaluation of the technologies as deployed on a test bed at FermiCloud, one of the Fermilab infrastructure-as-a-service Cloud facilities. The test bed consists of 4 server-class nodes with 40 TB of disk space and up to 50 virtual machine clients, some running on the storage server nodes themselves. With this configuration, the evaluation compares the performance of some of these technologies when deployed on virtual machines and on "bare metal" nodes. In addition to running standard benchmarks such as IOZone to check the sanity of our installation, we have run I/O intensive tests using physics-analysis applications. This paper presents how the storage solutions perform in a variety of realistic use cases of scientific computing. One interesting difference among the storage systems tested is found in a decrease in total read throughput with increasing number of client processes, which occurs in some implementations but not others