1,746 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Towards Loosely-Coupled Programming on Petascale Systems

    Full text link
    We have extended the Falkon lightweight task execution framework to make loosely coupled programming on petascale systems a practical and useful programming model. This work studies and measures the performance factors involved in applying this approach to enable the use of petascale systems by a broader user community, and with greater ease. Our work enables the execution of highly parallel computations composed of loosely coupled serial jobs with no modifications to the respective applications. This approach allows a new-and potentially far larger-class of applications to leverage petascale systems, such as the IBM Blue Gene/P supercomputer. We present the challenges of I/O performance encountered in making this model practical, and show results using both microbenchmarks and real applications from two domains: economic energy modeling and molecular dynamics. Our benchmarks show that we can scale up to 160K processor-cores with high efficiency, and can achieve sustained execution rates of thousands of tasks per second.Comment: IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SuperComputing/SC) 200

    Many-Task Computing and Blue Waters

    Full text link
    This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

    Armada: a Parallel I/O Framework for Computational Grids

    Get PDF
    High-performance computing increasingly occurs on “computational grids” composed of heterogeneous and geographically distributed systems of computers, networks, and storage devices that collectively act as a single “virtual” computer. One of the great challenges for this environment is to provide efficient access to data that is distributed across remote data servers in a grid. In this paper, we describe our solution, a framework we call Armada. Armada allows applications to flexibly compose modules to access their data, and to place those modules at appropriate hosts within the grid to reduce network traffic

    Polish grid infrastructure for science and research

    Full text link
    Structure, functionality, parameters and organization of the computing Grid in Poland is described, mainly from the perspective of high-energy particle physics community, currently its largest consumer and developer. It represents distributed Tier-2 in the worldwide Grid infrastructure. It also provides services and resources for data-intensive applications in other sciences.Comment: Proceeedings of IEEE Eurocon 2007, Warsaw, Poland, 9-12 Sep. 2007, p.44

    Implementation of Sub-Grid-Federation Model for Performance Improvement in Federated Data Grid

    Get PDF
    In this work, a new model for federation data grid system called Sub-Grid-Federation was designed to improve access latency by accessing data from the nearest possible sites. The strategy in optimising data access was based on the process of searching into the area identified as ‘Network Core Area’ (NCA). The performance of access latency in Sub-Grid-Federation was tested based on the mathematical proving and simulated using OptorSim simulator. Four case studies were carried out and tested in Optimal Downloading Replication Strategy (ODRS) and the Sub-Grid-Federation. The results show that Sub-Grid-Federation is 20% better in terms of access latency and 21% better in terms of reducing remotes sites access compared to ODRS. The results indicate that the Sub-Grid-Federation is a better alternative for the implementation of collaboration and data sharing in data grid system.                                                                                    Keywords: Data grid, replication, scheduling, access latenc
    corecore