22,466 research outputs found
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Heterogeneous Relational Databases for a Grid-enabled Analysis Environment
Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a virtual data access mechanism, i.e. a mechanism which can hide the heterogeneity of the backend databases from the client applications. This paper focuses on accessing data stored in disparate relational databases through a web service interface, and exploits the features of a Data Warehouse and Data Marts. We present a middleware that enables applications to access data stored in geographically distributed relational databases without being aware of their physical locations and underlying schema. A web service interface is provided to enable applications to access this middleware in a language and platform independent way. A prototype implementation was created based on Clarens [4], Unity [7] and POOL [8]. This ability to access the data stored in the distributed relational databases transparently is likely to be a very powerful one for Grid users, especially the scientific community wishing to collate and analyze data distributed over the Grid
A study of two transaction-processing architectures for distributed real-time database systems
Cataloged from PDF version of article.A real-time data base system (RTDBS) is designed to provide timely response to the transactions of data-intensive applications. Processing a transaction in a distributed RTDBS environment presents the design choice of how to provide access to remote data referenced by the transaction. Satisfaction of the timing constraints of transactions should be the primary factor to be considered in scheduling accesses to remote data. In this article, we describe and analyze two different alternative approaches to this fundamental design decision. With the first alternative, transaction operations are executed at the sites where required data pages reside. The other alternative is based on transmitting data pages wherever they are needed. Although the latter approach is characterized by large message volumes carrying data pages, it is shown in our experiments to perform better than the other approach under most of the work loads and system configurations tested. The performance metric used in the evaluations is the fraction of transactions that satisfy their timing constraints. © 1995
Protocols for Integrity Constraint Checking in Federated Databases
A federated database is comprised of multiple interconnected database systems that primarily operate independently but cooperate to a certain extent. Global integrity constraints can be very useful in federated databases, but the lack of global queries, global transaction mechanisms, and global concurrency control renders traditional constraint management techniques inapplicable. This paper presents a threefold contribution to integrity constraint checking in federated databases: (1) The problem of constraint checking in a federated database environment is clearly formulated. (2) A family of protocols for constraint checking is presented. (3) The differences across protocols in the family are analyzed with respect to system requirements, properties guaranteed by the protocols, and processing and communication costs. Thus, our work yields a suite of options from which a protocol can be chosen to suit the system capabilities and integrity requirements of a particular federated database environment
The Homeostasis Protocol: Avoiding Transaction Coordination Through Program Analysis
Datastores today rely on distribution and replication to achieve improved
performance and fault-tolerance. But correctness of many applications depends
on strong consistency properties - something that can impose substantial
overheads, since it requires coordinating the behavior of multiple nodes. This
paper describes a new approach to achieving strong consistency in distributed
systems while minimizing communication between nodes. The key insight is to
allow the state of the system to be inconsistent during execution, as long as
this inconsistency is bounded and does not affect transaction correctness. In
contrast to previous work, our approach uses program analysis to extract
semantic information about permissible levels of inconsistency and is fully
automated. We then employ a novel homeostasis protocol to allow sites to
operate independently, without communicating, as long as any inconsistency is
governed by appropriate treaties between the nodes. We discuss mechanisms for
optimizing treaties based on workload characteristics to minimize
communication, as well as a prototype implementation and experiments that
demonstrate the benefits of our approach on common transactional benchmarks
Advanced information processing system: Inter-computer communication services
The purpose is to document the functional requirements and detailed specifications for the Inter-Computer Communications Services (ICCS) of the Advanced Information Processing System (AIPS). An introductory section is provided to outline the overall architecture and functional requirements of the AIPS and to present an overview of the ICCS. An overview of the AIPS architecture as well as a brief description of the AIPS software is given. The guarantees of the ICCS are provided, and the ICCS is described as a seven-layered International Standards Organization (ISO) Model. The ICCS functional requirements, functional design, and detailed specifications as well as each layer of the ICCS are also described. A summary of results and suggestions for future work are presented
Recommended from our members
A Comparison of Cache Performance in Server-Based and Symmetric Database Architectures
We study the cache performance in a symmetric distributed main-memory database. The high performance networks in many large distributed systems enable a machine to reach the main memory of other nodes more quickly than the time to access local disks. We therefore introduce remote memory as an additional layer in the memory hierarchy between local memory and disks. In order to appreciate the tradeoffs of memory and cpu in the symmetric architecture, we compare system performance in alternative architectures. Simulations show that, by exploiting remote memory (in each node‘s cache), performance improves over a wide range of cache sizes as compared to a distributed client/server architecture. We also compare the symmetric model to a centralized-server model and parameterize the performance tradeoffs
A Covert Channel Using Named Resources
A network covert channel is created that uses resource names such as
addresses to convey information, and that approximates typical user behavior in
order to blend in with its environment. The channel correlates available
resource names with a user defined code-space, and transmits its covert message
by selectively accessing resources associated with the message codes. In this
paper we focus on an implementation of the channel using the Hypertext Transfer
Protocol (HTTP) with Uniform Resource Locators (URLs) as the message names,
though the system can be used in conjunction with a variety of protocols. The
covert channel does not modify expected protocol structure as might be detected
by simple inspection, and our HTTP implementation emulates transaction level
web user behavior in order to avoid detection by statistical or behavioral
analysis.Comment: 9 page
- …