534 research outputs found

    On-line data archives

    Get PDF
    ©2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.Digital libraries and other large archives of electronically retrievable and manipulable material are becoming widespread in both commercial and scientific arenas. Advances in networking technologies have led to a greater proliferation of wide-area distributed data warehousing with associated data management challenges. We review tools and technologies for supporting distributed on-line data archives and explain our key concept of active data archives, in which data can be, processed on-demand before delivery. We are developing wide-area data warehousing software infrastructure for geographically distributed archives of large scientific data sets, such as satellite image data, that are stored hierarchically on disk arrays and tape silos and are accessed by a variety of scientific and decision support applications. Interoperability is a major issue for distributed data archives and requires standards for server interfaces and metadata. We review present activities and our contributions in developing such standards for different application areas.K. Hawick, P. Coddington, H. James, C. Patte

    Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies

    Full text link
    Grid is an infrastructure that involves the integrated and collaborative use of computers, networks, databases and scientific instruments owned and managed by multiple organizations. Grid applications often involve large amounts of data and/or computing resources that require secure resource sharing across organizational boundaries. This makes Grid application management and deployment a complex undertaking. Grid middlewares provide users with seamless computing ability and uniform access to resources in the heterogeneous Grid environment. Several software toolkits and systems have been developed, most of which are results of academic research projects, all over the world. This chapter will focus on four of these middlewares--UNICORE, Globus, Legion and Gridbus. It also presents our implementation of a resource broker for UNICORE as this functionality was not supported in it. A comparison of these systems on the basis of the architecture, implementation model and several other features is included.Comment: 19 pages, 10 figure

    Advanced Message Routing for Scalable Distributed Simulations

    Get PDF
    The Joint Forces Command (JFCOM) Experimentation Directorate (J9)'s recent Joint Urban Operations (JUO) experiments have demonstrated the viability of Forces Modeling and Simulation in a distributed environment. The JSAF application suite, combined with the RTI-s communications system, provides the ability to run distributed simulations with sites located across the United States, from Norfolk, Virginia to Maui, Hawaii. Interest-aware routers are essential for communications in the large, distributed environments, and the current RTI-s framework provides such routers connected in a straightforward tree topology. This approach is successful for small to medium sized simulations, but faces a number of significant limitations for very large simulations over high-latency, wide area networks. In particular, traffic is forced through a single site, drastically increasing distances messages must travel to sites not near the top of the tree. Aggregate bandwidth is limited to the bandwidth of the site hosting the top router, and failures in the upper levels of the router tree can result in widespread communications losses throughout the system. To resolve these issues, this work extends the RTI-s software router infrastructure to accommodate more sophisticated, general router topologies, including both the existing tree framework and a new generalization of the fully connected mesh topologies used in the SF Express ModSAF simulations of 100K fully interacting vehicles. The new software router objects incorporate the scalable features of the SF Express design, while optionally using low-level RTI-s objects to perform actual site-to-site communications. The (substantial) limitations of the original mesh router formalism have been eliminated, allowing fully dynamic operations. The mesh topology capabilities allow aggregate bandwidth and site-to-site latencies to match actual network performance. The heavy resource load at the root node can now be distributed across routers at the participating sites

    A reconfigurable component-based problem solving environment

    Get PDF
    ©2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.Problem solving environments are an attractive approach to the integration of calculation and management tools for various scientific and engineering applications. These applications often require high performance computing components in order to be computationally feasible. It is therefore a challenge to construct integration technology, suitable for problem solving environments, that allows both flexibility as well as the embedding of parallel and high performance computing systems. Our DISCWorld system is designed to meet these needs and provides a Java-based middleware to integrate component applications across wide-area networks. Key features of our design are the abilities to: access remotely stored data; compose complex processing requests either graphically or through a scripting language; execute components on heterogeneous and remote platforms; reconfigure task sub-graphs to run across multiple servers. Operators in task graphs can be slow (but portable) “pure Java” implementations or wrappers to fast (platform specific) supercomputer implementations.K. Hawick, H. James, P. Coddingto

    Querying Large Physics Data Sets Over an Information Grid

    Get PDF
    Optimising use of the Web (WWW) for LHC data analysis is a complex problem and illustrates the challenges arising from the integration of and computation across massive amounts of information distributed worldwide. Finding the right piece of information can, at times, be extremely time-consuming, if not impossible. So-called Grids have been proposed to facilitate LHC computing and many groups have embarked on studies of data replication, data migration and networking philosophies. Other aspects such as the role of 'middleware' for Grids are emerging as requiring research. This paper positions the need for appropriate middleware that enables users to resolve physics queries across massive data sets. It identifies the role of meta-data for query resolution and the importance of Information Grids for high-energy physics analysis rather than just Computational or Data Grids. This paper identifies software that is being implemented at CERN to enable the querying of very large collaborating HEP data-sets, initially being employed for the construction of CMS detectors.Comment: 4 pages, 3 figure

    An environment for workflow applications on wide-area distributed systems

    Get PDF
    ©2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.Workflow techniques are emerging as an important approach for the specification and management of complex processing tasks. This approach is especially powerful for utilising distributed data and processing resources in widely-distributed heterogeneous systems. We describe our DISCWorld distributed workflow environment for composing complex processing chains, which are specified as a directed acyclic graph of operators. Users of our system can formulate processing chains using either graphical or scripting tools. We have deployed our system for image processing applications and decision support systems. We describe the technologies we have developed to enable the execution of these processing chains across wide-area computing systems. In particular, we present our Distributed Job Placement Language (based on XML) and various Java interface approaches we have developed for implementing the workflow metaphor. We outline a number of key issues for implementing a high-performance, reliable, distributed workflow management system.James, H.A.; Hawick, K.A.; Coddington, P.D

    MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface

    Full text link
    Application development for distributed computing "Grids" can benefit from tools that variously hide or enable application-level management of critical aspects of the heterogeneous environment. As part of an investigation of these issues, we have developed MPICH-G2, a Grid-enabled implementation of the Message Passing Interface (MPI) that allows a user to run MPI programs across multiple computers, at the same or different sites, using the same commands that would be used on a parallel computer. This library extends the Argonne MPICH implementation of MPI to use services provided by the Globus Toolkit for authentication, authorization, resource allocation, executable staging, and I/O, as well as for process creation, monitoring, and control. Various performance-critical operations, including startup and collective operations, are configured to exploit network topology information. The library also exploits MPI constructs for performance management; for example, the MPI communicator construct is used for application-level discovery of, and adaptation to, both network topology and network quality-of-service mechanisms. We describe the MPICH-G2 design and implementation, present performance results, and review application experiences, including record-setting distributed simulations.Comment: 20 pages, 8 figure
    corecore