Search CORE

60 research outputs found

Replica Selection in the Globus Data Grid

Author: Foster Ian
Tuecke Steven
Vazhkudai Sudharshan
Publication venue
Publication date: 01/01/2000
Field of study

The Globus Data Grid architecture provides a scalable infrastructure for the management of storage resources and data that are distributed across Grid environments. These services are designed to support a variety of scientific applications, ranging from high-energy physics to computational genomics, that require access to large amounts of data (terabytes or even petabytes) with varied quality of service requirements. By layering on a set of core services, such as data transport, security, and replica cataloging, one can construct various higher-level services. In this paper, we discuss the design and implementation of a high-level replica selection service that uses information regarding replica location and user preferences to guide selection from among storage replica alternatives. We first present a basic replica selection service design, then show how dynamic information collected using Globus information service capabilities concerning storage system properties can help improve and optimize the selection process. We demonstrate the use of Condor's ClassAds resource description and matchmaking mechanism as an efficient tool for representing and matching storage resource capabilities and policies against application requirements.Comment: 8 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

UNT Digital Library

Reducing data movement costs using energy-efficient, active computation on ssd

Author: Devesh Tiwari
Peter J Desnoyers
Simona Boboila
Sudharshan S Vazhkudai
Xiaosong Ma
Youngjae Kim
Publication venue
Publication date: 01/01/2012
Field of study

ABSTRACT Modern scientific discovery often involves running complex application simulations on supercomputers, followed by a sequence of data analysis tasks on smaller clusters. This offline approach suffers from significant data movement costs such as redundant I/O, storage bandwidth bottleneck, and wasted CPU cycles, all of which contribute to increased energy consumption and delayed end-toend performance. Technology projections for an exascale machine indicate that energy-efficiency will become the primary design metric. It is estimated that the energy cost of data movement will soon rival the cost of computation. Consequently, we can no longer ignore the data movement costs in data analysis. To address these challenges, we advocate executing data analysis tasks on emerging storage devices, such as SSDs. Typically, in extreme-scale systems, SSDs serve only as a temporary storage system for the simulation output data. In our approach, Active Flash, we propose to conduct in-situ data analysis on the SSD controller without degrading the performance of the simulation job. By migrating analysis tasks closer to where the data resides, it helps reduce the data movement cost. We present detailed energy and performance models for both active flash and offline strategies, and study them using extreme-scale application simulations, commonly used data analytics kernels, and supercomputer system configurations. Our evaluation suggests that active flash is a promising approach to alleviate the storage bandwidth bottleneck, reduce the data movement cost, and improve the overall energy efficiency

CiteSeerX

High Performance Computing Facility Operational Assessment, 2012 Oak Ridge Leadership Computing Facility

Author: Barker Ashley D
Bernholdt David E
Bland Arthur S Buddy
Hack James J
Hudson Douglas L
Messer Bronson
Rogers James H
Thach Kevin G
Vazhkudai Sudharshan S
Wells Jack C
White Julia C
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/02/2012
Field of study

Crossref

UNT Digital Library

Enabling the Co-Allocation of Grid Data Transfers

Author: Sudharshan Vazhkudai
Publication venue
Publication date: 01/01/2003
Field of study

Data-sharing scientific communities use storage systems as distributed data stores by replicating content. In such highly replicated environments, a particular dataset can reside at multiple locations and can thus be downloaded from any one of them. Since datasets of interest are significantly large in size, improving download speeds either by server selection or by co-allocation can offer substantial benefits. In this paper, we present an architecture for co-allocating Grid data transfers across multiple connections, enabling the parallel download of datasets from multiple servers. We have developed several co-allocation strategies comprising of simple brute-force, history-based and dynamic load balancing techniques as a means both to exploit rate differences among the various client-server links and to address dynamic rate fluctuations. We evaluate our approaches using the GridFTP data movement protocol in a wide-area testbed and present our results

CiteSeerX

Using Regression Techniques to Predict Large Data Transfers

Author: Jennifer M. Schopf
Jennifer M. Schopf
Sudharshan Vazhkudai
Sudharshan Vazhkudai
Publication venue
Publication date
Field of study

The recent proliferation of Data Grids and the increasingly common practice of using resources as distributed data stores provide a convenient environment for communities of researchers to share, replicate, and manage access to copies of large datasets. This has led to the question of which replica can be accessed most efficiently. In such environments, fetching data from one of the several replica locations requires accurate predictions of end-to-end transfer times. The answer to this question can depend on many factors, including physical characteristics of the resources and the load behavior on the CPUs, networks, and storage devices that are part of the end-to-end data path linking possible sources and sinks

CiteSeerX