Search CORE

147 research outputs found

Flexible Session Management in a Distributed Environment

Author: Dan Bradley
Douglas Thain
Foster I
Freier A O
Igor Sfiligoi
Igor Sfiligoi
Keith Brown
Michael Litzkow
Miron Livny
Rajesh Raman
Ronald L Rivest
Schneier B
Steiner J G
Todd Tannenbaum
US National Institute of Standards
US National Institute of Standards
Zach Miller
Publication venue: 'IOP Publishing'
Publication date: 01/01/2010
Field of study

Many secure communication libraries used by distributed systems, such as SSL, TLS, and Kerberos, fail to make a clear distinction between the authentication, session, and communication layers. In this paper we introduce CEDAR, the secure communication library used by the Condor High Throughput Computing software, and present the advantages to a distributed computing system resulting from CEDAR's separation of these layers. Regardless of the authentication method used, CEDAR establishes a secure session key, which has the flexibility to be used for multiple capabilities. We demonstrate how a layered approach to security sessions can avoid round-trips and latency inherent in network authentication. The creation of a distinct session management layer allows for optimizations to improve scalability by way of delegating sessions to other components in the system. This session delegation creates a chain of trust that reduces the overhead of establishing secure connections and enables centralized enforcement of system-wide security policies. Additionally, secure channels based upon UDP datagrams are often overlooked by existing libraries; we show how CEDAR's structure accommodates this as well. As an example of the utility of this work, we show how the use of delegated security sessions and other techniques inherent in CEDAR's architecture enables US CMS to meet their scalability requirements in deploying Condor over large-scale, wide-area grid systems

arXiv.org e-Print Archive

CiteSeerX

Crossref

UNT (University of North Texas) Digital Library

Characterizing Networking as Experienced by Users (White Paper)

Author: Sfiligoi Igor
Publication venue: Clemson University Libraries
Publication date: 14/04/2020
Field of study

Clemson Open (Clemson University)

Demonstrating 100 Gbps in and out of the public Clouds

Author: Sfiligoi Igor
Smarr Larry
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/05/2020
Field of study

There is increased awareness and recognition that public Cloud providers do provide capabilities not found elsewhere, with elasticity being a major driver. The value of elastic scaling is however tightly coupled to the capabilities of the networks that connect all involved resources, both in the public Clouds and at the various research institutions. This paper presents results of measurements involving file transfers inside public Cloud providers, fetching data from on-prem resources into public Cloud instances and fetching data from public Cloud storage into on-prem nodes. The networking of the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform, has been benchmarked. The on-prem nodes were managed by either the Pacific Research Platform or located at the University of Wisconsin - Madison. The observed sustained throughput was of the order of 100 Gbps in all the tests moving data in and out of the public Clouds and throughput reaching into the Tbps range for data movements inside the public Cloud providers themselves. All the tests used HTTP as the transfer protocol.Comment: 4 pages, 6 figures, 3 table

arXiv.org e-Print Archive

Crossref

Defining a canonical unit for accounting purposes

Author: Andrijauskas Fabio
Sfiligoi Igor
Würthwein Frank
Publication venue
Publication date: 17/05/2023
Field of study

Compute resource providers often put in place batch compute systems to maximize the utilization of such resources. However, compute nodes in such clusters, both physical and logical, contain several complementary resources, with notable examples being CPUs, GPUs, memory and ephemeral storage. User jobs will typically require more than one such resource, resulting in co-scheduling trade-offs of partial nodes, especially in multi-user environments. When accounting for either user billing or scheduling overhead, it is thus important to consider all such resources together. We thus define the concept of a threshold-based "canonical unit" that combines several resource types into a single discrete unit and use it to characterize scheduling overhead and make resource billing more fair for both resource providers and users. Note that the exact definition of a canonical unit is not prescribed and may change between resource providers. Nevertheless, we provide a template and two example definitions that we consider appropriate in the context of the Open Science Grid.Comment: 6 pages, 2 figures, To be published in proceedings of PEARC2

arXiv.org e-Print Archive

Any Data, Any Time, Anywhere: Global Data Access for Science

Author: Bloom Kenneth
Boccali Tommaso
Bockelman Brian
Bradley Daniel
Dasu Sridhara
Dost Jeff
Fanzago Federica
Sfiligoi Igor
Tadel Alja Mrak
Tadel Matevz
Vuosalo Carl
Würthwein Frank
Yagil Avi
Zvada Marian
Publication venue
Publication date: 06/08/2015
Field of study

Data access is key to science driven by distributed high-throughput computing (DHTC), an essential technology for many major research projects such as High Energy Physics (HEP) experiments. However, achieving efficient data access becomes quite difficult when many independent storage sites are involved because users are burdened with learning the intricacies of accessing each system and keeping careful track of data location. We present an alternate approach: the Any Data, Any Time, Anywhere infrastructure. Combining several existing software products, AAA presents a global, unified view of storage systems - a "data federation," a global filesystem for software delivery, and a workflow management system. We present how one HEP experiment, the Compact Muon Solenoid (CMS), is utilizing the AAA infrastructure and some simple performance metrics.Comment: 9 pages, 6 figures, submitted to 2nd IEEE/ACM International Symposium on Big Data Computing (BDC) 201

arXiv.org e-Print Archive

Crossref

Porting and optimizing UniFrac for GPUs

Author: Knight Rob
McDonald Daniel
Sfiligoi Igor
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/05/2020
Field of study

UniFrac is a commonly used metric in microbiome research for comparing microbiome profiles to one another ("beta diversity"). The recently implemented Striped UniFrac added the capability to split the problem into many independent subproblems and exhibits near linear scaling. In this paper we describe steps undertaken in porting and optimizing Striped Unifrac to GPUs. We reduced the run time of computing UniFrac on the published Earth Microbiome Project dataset from 13 hours on an Intel Xeon E5-2680 v4 CPU to 12 minutes on an NVIDIA Tesla V100 GPU, and to about one hour on a laptop with NVIDIA GTX 1050 (with minor loss in precision). Computing UniFrac on a larger dataset containing 113k samples reduced the run time from over one month on the CPU to less than 2 hours on the V100 and 9 hours on an NVIDIA RTX 2080TI GPU (with minor loss in precision). This was achieved by using OpenACC for generating the GPU offload code and by improving the memory access patterns. A BSD-licensed implementation is available, which produces a C shared library linkable by any programming language.Comment: 4 pages, 3 figures, 4 table

arXiv.org e-Print Archive

Crossref

Characterizing network paths in and out of the clouds

Author: Graham John
Sfiligoi Igor
Wuerthwein Frank
Publication venue: 'EDP Sciences'
Publication date: 04/11/2019
Field of study

Commercial Cloud computing is becoming mainstream, with funding agencies moving beyond prototyping and starting to fund production campaigns, too. An important aspect of any scientific computing production campaign is data movement, both incoming and outgoing. And while the performance and cost of VMs is relatively well understood, the network performance and cost is not. This paper provides a characterization of networking in various regions of Amazon Web Services, Microsoft Azure and Google Cloud Platform, both between Cloud resources and major DTNs in the Pacific Research Platform, including OSG data federation caches in the network backbone, and inside the clouds themselves. The paper contains both a qualitative analysis of the results as well as latency and throughput measurements. It also includes an analysis of the costs involved with Cloud-based networking.Comment: 7 pages, 1 figure, 5 tables, to be published in CHEP19 proceeding

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

The Francis Crick Institute

glideinWMS - A generic pilot-based Workload Management System

Author: Sfiligoi Igor
Publication venue: Fermi National Accelerator Laboratory
Publication date: 01/09/2007
Field of study

The Grid resources are distributed among hundreds of independent Grid sites, requiring a higher level Workload Management System (WMS) to be used efficiently. Pilot jobs have been used for this purpose by many communities, bringing increased reliability, global fair share and just in time resource matching. GlideinWMS is a WMS based on the Condor glidein concept, i.e. a regular Condor pool, with the Condor daemons (startds) being started by pilot jobs, and real jobs being vanilla, standard or MPI universe jobs. The glideinWMS is composed of a set of Glidein Factories, handling the submission of pilot jobs to a set of Grid sites, and a set of VO Frontends, requesting pilot submission based on the status of user jobs. This paper contains the structural overview of glideinWMS as well as a detailed description of the current implementation and the current scalability limits

CiteSeerX

UNT (University of North Texas) Digital Library

Recommended from our members

Mitigating memory latency in FM Index search

Author: Knight Rob
McDonald Daniel
Sfiligoi Igor
Publication venue: eScholarship, University of California
Publication date: 20/01/2024
Field of study

eScholarship - University of California

Testing GitHub projects on custom resources using unprivileged Kubernetes runners

Author: Knight Rob
McDonald Daniel
Sfiligoi Igor
Würthwein Frank
Publication venue
Publication date: 17/05/2023
Field of study

GitHub is a popular repository for hosting software projects, both due to ease of use and the seamless integration with its testing environment. Native GitHub Actions make it easy for software developers to validate new commits and have confidence that new code does not introduce major bugs. The freely available test environments are limited to only a few popular setups but can be extended with custom Action Runners. Our team had access to a Kubernetes cluster with GPU accelerators, so we explored the feasibility of automatically deploying GPU-providing runners there. All available Kubernetes-based setups, however, require cluster-admin level privileges. To address this problem, we developed a simple custom setup that operates in a completely unprivileged manner. In this paper we provide a summary description of the setup and our experience using it in the context of two Knight lab projects on the Prototype National Research Platform system.Comment: 5 pages, 1 figure, To be published in proceedings of PEARC2

arXiv.org e-Print Archive