58,405 research outputs found
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation
Historically, high energy physics computing has been performed on large
purpose-built computing systems. These began as single-site compute facilities,
but have evolved into the distributed computing grids used today. Recently,
there has been an exponential increase in the capacity and capability of
commercial clouds. Cloud resources are highly virtualized and intended to be
able to be flexibly deployed for a variety of computing tasks. There is a
growing nterest among the cloud providers to demonstrate the capability to
perform large-scale scientific computing. In this paper, we discuss results
from the CMS experiment using the Fermilab HEPCloud facility, which utilized
both local Fermilab resources and virtual machines in the Amazon Web Services
Elastic Compute Cloud. We discuss the planning, technical challenges, and
lessons learned involved in performing physics workflows on a large-scale set
of virtualized resources. In addition, we will discuss the economics and
operational efficiencies when executing workflows both in the cloud and on
dedicated resources.Comment: 15 pages, 9 figure
The Effect of Networking Performance on High Energy Physics Computing
High Energy Physics (HEP) data analysis consists of simulating and analysing events in particle physics. In order to understand physics phenomena, one must collect and go through a very large quantity of data generated by particle accelerators and software simulations. This data analysis can be done using the cloud computing paradigm in distributed computing environment where data and computation can be located in different, geographically distant, data centres. This adds complexity and overhead to networking. In this paper, we study how the networking solution and its performance affects the efficiency and energy consumption of HEP computing. Our results indicate that higher latency both prolongs the processing time and increases the energy consumption.Peer reviewe
CernVM Online and Cloud Gateway: a uniform interface for CernVM contextualization and deployment
In a virtualized environment, contextualization is the process of configuring
a VM instance for the needs of various deployment use cases. Contextualization
in CernVM can be done by passing a handwritten context to the user data field
of cloud APIs, when running CernVM on the cloud, or by using CernVM web
interface when running the VM locally. CernVM Online is a publicly accessible
web interface that unifies these two procedures. A user is able to define,
store and share CernVM contexts using CernVM Online and then apply them either
in a cloud by using CernVM Cloud Gateway or on a local VM with the single-step
pairing mechanism. CernVM Cloud Gateway is a distributed system that provides a
single interface to use multiple and different clouds (by location or type,
private or public). Cloud gateway has been so far integrated with OpenNebula,
CloudStack and EC2 tools interfaces. A user, with access to a number of clouds,
can run CernVM cloud agents that will communicate with these clouds using their
interfaces, and then use one single interface to deploy and scale CernVM
clusters. CernVM clusters are defined in CernVM Online and consist of a set of
CernVM instances that are contextualized and can communicate with each other.Comment: Conference paper at the 2013 Computing in High Energy Physics (CHEP)
Conference, Amsterda
Boosting Performance of Data-intensive Analysis Workflows with Distributed Coordinated Caching
Data-intensive end-user analyses in high energy physics require high data throughput to reach short turnaround cycles. This leads to enormous challenges for storage and network infrastructure, especially when facing the tremendously increasing amount of data to be processed during High-Luminosity LHC runs. Including opportunistic resources with volatile storage systems into the traditional HEP computing facilities makes this situation more complex.
Bringing data close to the computing units is a promising approach to solve throughput limitations and improve the overall performance. We focus on coordinated distributed caching by coordinating workows to the most suitable hosts in terms of cached files. This allows optimizing overall processing efficiency of data-intensive workows and efficiently use limited cache volume by reducing replication of data on distributed caches.
We developed a NaviX coordination service at KIT that realizes coordinated distributed caching using XRootD cache proxy server infrastructure and HTCondor batch system. In this paper, we present the experience gained in operating coordinated distributed caches on cloud and HPC resources. Furthermore, we show benchmarks of a dedicated high throughput cluster, the Throughput-Optimized Analysis-System (TOpAS), which is based on the above-mentioned concept
Prototyping a ROOT-based distributed analysis workflow for HL-LHC: the CMS use case
The challenges expected for the next era of the Large Hadron Collider (LHC),
both in terms of storage and computing resources, provide LHC experiments with
a strong motivation for evaluating ways of rethinking their computing models at
many levels. Great efforts have been put into optimizing the computing resource
utilization for the data analysis, which leads both to lower hardware
requirements and faster turnaround for physics analyses. In this scenario, the
Compact Muon Solenoid (CMS) collaboration is involved in several activities
aimed at benchmarking different solutions for running High Energy Physics (HEP)
analysis workflows. A promising solution is evolving software towards more
user-friendly approaches featuring a declarative programming model and
interactive workflows. The computing infrastructure should keep up with this
trend by offering on the one side modern interfaces, and on the other side
hiding the complexity of the underlying environment, while efficiently
leveraging the already deployed grid infrastructure and scaling toward
opportunistic resources like public cloud or HPC centers. This article presents
the first example of using the ROOT RDataFrame technology to exploit such
next-generation approaches for a production-grade CMS physics analysis. A new
analysis facility is created to offer users a modern interactive web interface
based on JupyterLab that can leverage HTCondor-based grid resources on
different geographical sites. The physics analysis is converted from a legacy
iterative approach to the modern declarative approach offered by RDataFrame and
distributed over multiple computing nodes. The new scenario offers not only an
overall improved programming experience, but also an order of magnitude speedup
increase with respect to the previous approach
A Web portal to simplify the scientific communities in using Grid and Cloud resources
The modern scientific applications demand increasing availability of computing and storage resources in order to collect and analyse big volume of data that often the single laboratories are not able to provide. Distributed computing models have proved to be a valid and effective solution. Proofs are the Grid, widely used in the high energy physics experiments, and Cloud solutions that are showing an increasing acceptance.
These infrastructures require robust Authentication and Authorization mechanisms. The X.509 certificate is the standard used to authenticate Grid users and although it represents a valid security mechanism, many communities complain about the difficulty of handling digital certificates and the complexity of the Grid middleware. These represent the main obstacles to the full exploitation of computing and data distributed infrastructures.
In order to simplify the use of these resources it has been developed a Web-based portal that provides users with several important functionalities such as job and workflow submission, interactive service and data management for both Grid and Cloud environments. The thesis describes the Portal architecture, its features, the main benefits for users and the custom views which have been defined and tested in collaboration with some communities to address relevant use cases
Resource provisioning in Science Clouds: Requirements and challenges
Cloud computing has permeated into the information technology industry in the
last few years, and it is emerging nowadays in scientific environments. Science
user communities are demanding a broad range of computing power to satisfy the
needs of high-performance applications, such as local clusters,
high-performance computing systems, and computing grids. Different workloads
are needed from different computational models, and the cloud is already
considered as a promising paradigm. The scheduling and allocation of resources
is always a challenging matter in any form of computation and clouds are not an
exception. Science applications have unique features that differentiate their
workloads, hence, their requirements have to be taken into consideration to be
fulfilled when building a Science Cloud. This paper will discuss what are the
main scheduling and resource allocation challenges for any Infrastructure as a
Service provider supporting scientific applications
- …