29,516 research outputs found
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Towards More Data-Aware Application Integration (extended version)
Although most business application data is stored in relational databases,
programming languages and wire formats in integration middleware systems are
not table-centric. Due to costly format conversions, data-shipments and faster
computation, the trend is to "push-down" the integration operations closer to
the storage representation.
We address the alternative case of defining declarative, table-centric
integration semantics within standard integration systems. For that, we replace
the current operator implementations for the well-known Enterprise Integration
Patterns by equivalent "in-memory" table processing, and show a practical
realization in a conventional integration system for a non-reliable,
"data-intensive" messaging example. The results of the runtime analysis show
that table-centric processing is promising already in standard, "single-record"
message routing and transformations, and can potentially excel the message
throughput for "multi-record" table messages.Comment: 18 Pages, extended version of the contribution to British
International Conference on Databases (BICOD), 2015, Edinburgh, Scotlan
MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface
Application development for distributed computing "Grids" can benefit from
tools that variously hide or enable application-level management of critical
aspects of the heterogeneous environment. As part of an investigation of these
issues, we have developed MPICH-G2, a Grid-enabled implementation of the
Message Passing Interface (MPI) that allows a user to run MPI programs across
multiple computers, at the same or different sites, using the same commands
that would be used on a parallel computer. This library extends the Argonne
MPICH implementation of MPI to use services provided by the Globus Toolkit for
authentication, authorization, resource allocation, executable staging, and
I/O, as well as for process creation, monitoring, and control. Various
performance-critical operations, including startup and collective operations,
are configured to exploit network topology information. The library also
exploits MPI constructs for performance management; for example, the MPI
communicator construct is used for application-level discovery of, and
adaptation to, both network topology and network quality-of-service mechanisms.
We describe the MPICH-G2 design and implementation, present performance
results, and review application experiences, including record-setting
distributed simulations.Comment: 20 pages, 8 figure
The MeshRouter Architecture
The Joint Forces Command (JFCOM) Experimentation Directorate (J9)'s recent Joint Urban Operations (JUO)
experiments have demonstrated the viability of Forces Modeling and Simulation in a distributed environment. The
JSAF application suite, combined with the RTI-s communications system, provides the ability to run distributed
simulations with sites located across the United States, from Norfolk, Virginia to Maui, Hawaii. Interest-aware
routers are essential for communications in the large, distributed environments, and the current RTI-s framework
provides such routers connected in a straightforward tree topology. This approach is successful for small to medium
sized simulations, but faces a number of significant limitations for very large simulations over high-latency, wide
area networks. In particular, traffic is forced through a single site, drastically increasing distances messages must
travel to sites not near the top of the tree. Aggregate bandwidth is limited to the bandwidth of the site hosting the
top router, and failures in the upper levels of the router tree can result in widespread communications losses
throughout the system.
To resolve these issues, this work extends the RTI-s software router infrastructure to accommodate more
sophisticated, general router topologies, including both the existing tree framework and a new generalization of the
fully connected mesh topologies used in the SF Express ModSAF simulations of 100K fully interacting vehicles.
The new software router objects incorporate the scalable features of the SF Express design, while optionally using
low-level RTI-s objects to perform actual site-to-site communications. The (substantial) limitations of the original
mesh router formalism have been eliminated, allowing fully dynamic operations. The mesh topology capabilities
allow aggregate bandwidth and site-to-site latencies to match actual network performance. The heavy resource load at
the root node can now be distributed across routers at the participating sites
- …