54,090 research outputs found
Recommended from our members
FutureGRID: A Program for long-term research into GRID systems architecture
Proceedings of the 2003 UK e-Science All Hands Meeting, 31st August - 3rd September, Nottingham UKThis is a project to carry out research into long-term GRID architecture, in the University of Cambridge
Computer Laboratory and the Cambridge eScience Center, with support from the Microsoft Research
Laboratory, Cambridge.
It is part of a larger vision for future systems architectures for public computing platforms, including
both scientitic GRID and commodity level computing such as games, peer2peer computing and storage
services and so forth, based on work in the laboratories in recent years into massively scaleable distributed systems for storage, computation, content distribution and collaboration[26]
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Enhancing Job Scheduling of an Atmospheric Intensive Data Application
Nowadays, e-Science applications involve great deal of data to have more accurate analysis. One of its application domains is the Radio Occultation which manages satellite data. Grid Processing Management is a physical infrastructure geographically distributed based on Grid Computing, that is implemented for the overall processing Radio Occultation analysis. After a brief description of algorithms adopted to characterize atmospheric profiles, the paper presents an improvement of job scheduling in order to decrease processing time and optimize resource utilization. Extension of grid computing capacity is implemented by virtual machines in existing physical Grid in order to satisfy temporary job requests. Also scheduling plays an important role in the infrastructure that is handled by a couple of schedulers which are developed to manage data automaticall
Survey and Analysis of Production Distributed Computing Infrastructures
This report has two objectives. First, we describe a set of the production
distributed infrastructures currently available, so that the reader has a basic
understanding of them. This includes explaining why each infrastructure was
created and made available and how it has succeeded and failed. The set is not
complete, but we believe it is representative.
Second, we describe the infrastructures in terms of their use, which is a
combination of how they were designed to be used and how users have found ways
to use them. Applications are often designed and created with specific
infrastructures in mind, with both an appreciation of the existing capabilities
provided by those infrastructures and an anticipation of their future
capabilities. Here, the infrastructures we discuss were often designed and
created with specific applications in mind, or at least specific types of
applications. The reader should understand how the interplay between the
infrastructure providers and the users leads to such usages, which we call
usage modalities. These usage modalities are really abstractions that exist
between the infrastructures and the applications; they influence the
infrastructures by representing the applications, and they influence the ap-
plications by representing the infrastructures
Predicting Intermediate Storage Performance for Workflow Applications
Configuring a storage system to better serve an application is a challenging
task complicated by a multidimensional, discrete configuration space and the
high cost of space exploration (e.g., by running the application with different
storage configurations). To enable selecting the best configuration in a
reasonable time, we design an end-to-end performance prediction mechanism that
estimates the turn-around time of an application using storage system under a
given configuration. This approach focuses on a generic object-based storage
system design, supports exploring the impact of optimizations targeting
workflow applications (e.g., various data placement schemes) in addition to
other, more traditional, configuration knobs (e.g., stripe size or replication
level), and models the system operation at data-chunk and control message
level.
This paper presents our experience to date with designing and using this
prediction mechanism. We evaluate this mechanism using micro- as well as
synthetic benchmarks mimicking real workflow applications, and a real
application.. A preliminary evaluation shows that we are on a good track to
meet our objectives: it can scale to model a workflow application run on an
entire cluster while offering an over 200x speedup factor (normalized by
resource) compared to running the actual application, and can achieve, in the
limited number of scenarios we study, a prediction accuracy that enables
identifying the best storage system configuration
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
- ā¦