16,363 research outputs found
A Peer-to-Peer Middleware Framework for Resilient Persistent Programming
The persistent programming systems of the 1980s offered a programming model
that integrated computation and long-term storage. In these systems, reliable
applications could be engineered without requiring the programmer to write
translation code to manage the transfer of data to and from non-volatile
storage. More importantly, it simplified the programmer's conceptual model of
an application, and avoided the many coherency problems that result from
multiple cached copies of the same information. Although technically
innovative, persistent languages were not widely adopted, perhaps due in part
to their closed-world model. Each persistent store was located on a single
host, and there were no flexible mechanisms for communication or transfer of
data between separate stores. Here we re-open the work on persistence and
combine it with modern peer-to-peer techniques in order to provide support for
orthogonal persistence in resilient and potentially long-running distributed
applications. Our vision is of an infrastructure within which an application
can be developed and distributed with minimal modification, whereupon the
application becomes resilient to certain failure modes. If a node, or the
connection to it, fails during execution of the application, the objects are
re-instantiated from distributed replicas, without their reference holders
being aware of the failure. Furthermore, we believe that this can be achieved
within a spectrum of application programmer intervention, ranging from minimal
to totally prescriptive, as desired. The same mechanisms encompass an
orthogonally persistent programming model. We outline our approach to
implementing this vision, and describe current progress.Comment: Submitted to EuroSys 200
CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores
Distributed key-value stores provide scalable, fault-tolerant, and self-organizing
storage services, but fall short of guaranteeing linearizable consistency
in partially synchronous, lossy, partitionable, and dynamic networks, when data
is distributed and replicated automatically by the principle of consistent hashing.
This paper introduces consistent quorums as a solution for achieving atomic
consistency. We present the design and implementation of CATS, a distributed
key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is
scalable, elastic, and self-organizing; key properties for modern cloud storage
middleware. Our system shows that consistency can be achieved with practical
performance and modest throughput overhead (5%) for read-intensive workloads
Search optimizations in structured peer-to-peer systems
DHT systems are structured overlay networks capable of using P2P resources as a scalable platform for very large data storage applications. However, their efficiency expects a level of uniformity in the association of data to index keys that is often not present in inverted indexes. Index data tends to follow non-uniform distributions, often power law distributions, creating intense local storage hotspots and network bottlenecks on specific hosts. Current techniques like caching cannot, alone, cope with this issue. We propose a distributed data structure based on a decentralized balanced tree to balance storage data and network load more uniformly across hosts. The results show that the data structure is capable of balancing resources, in particular when performing multiple keyword searches
Distributed Management of Massive Data: an Efficient Fine-Grain Data Access Scheme
This paper addresses the problem of efficiently storing and accessing massive
data blocks in a large-scale distributed environment, while providing efficient
fine-grain access to data subsets. This issue is crucial in the context of
applications in the field of databases, data mining and multimedia. We propose
a data sharing service based on distributed, RAM-based storage of data, while
leveraging a DHT-based, natively parallel metadata management scheme. As
opposed to the most commonly used grid storage infrastructures that provide
mechanisms for explicit data localization and transfer, we provide a
transparent access model, where data are accessed through global identifiers.
Our proposal has been validated through a prototype implementation whose
preliminary evaluation provides promising results
A novel Hash-Based File Clustering scheme for efficient distributing, storing and retrieving of large scale Health Records
Cloud computing has been adopted as an efficient computing infrastructure model for provisioning resources and providing services to users. Several distributed resource models such as Hadoop and parallel databases have been deployed in healthcare-related services to manage electronic health records (EHR). However, these models are inefficient for managing a large number of small files and hence they are not widely deployed in Healthcare Information Systems. This paper proposed a novel Hash-Based File Clustering Scheme (HBFC) to distribute, store and retrieve EHR efficiently in cloud environments. The HBFC possesses two distinctive features: it utilizes hashing to distribute files into clusters in a control way and it utilizes P2P structures for data management. HBFC scheme is demonstrated to be effective in handling big health data that comprises of a large number of small files in various formats. It allows users to retrieve and access data records efficiently. The initial implementation results demonstrate that the proposed scheme outperforms original P2P system in term of data lookup latency
Taming hot-spots in DHT inverted indexes
DHT systems are structured overlay networks capable of using
P2P resources as a scalable platform for very large data storage
applications. However, their efficiency expects a level of uni-
formity in the association of data to index keys that is often
not present in inverted indexes. Index data tends to follow non-
uniform distributions, often power law distributions, creating in-
tense local storage hotspots and network bottlenecks on specific
hosts. Current techniques like caching cannot, alone, cope with
this issue.
We propose a new distributed data structure based on a decen-
tralized balanced tree to balance storage data and network load
more uniformly across all hosts. The approach is stackable with
standard DHTs and ensures that the DHT storage subsystem re-
ceives an uniform load by assigning fixed sized, or low variance,
blocks
- …