8,926 research outputs found
A Peer-to-Peer Middleware Framework for Resilient Persistent Programming
The persistent programming systems of the 1980s offered a programming model
that integrated computation and long-term storage. In these systems, reliable
applications could be engineered without requiring the programmer to write
translation code to manage the transfer of data to and from non-volatile
storage. More importantly, it simplified the programmer's conceptual model of
an application, and avoided the many coherency problems that result from
multiple cached copies of the same information. Although technically
innovative, persistent languages were not widely adopted, perhaps due in part
to their closed-world model. Each persistent store was located on a single
host, and there were no flexible mechanisms for communication or transfer of
data between separate stores. Here we re-open the work on persistence and
combine it with modern peer-to-peer techniques in order to provide support for
orthogonal persistence in resilient and potentially long-running distributed
applications. Our vision is of an infrastructure within which an application
can be developed and distributed with minimal modification, whereupon the
application becomes resilient to certain failure modes. If a node, or the
connection to it, fails during execution of the application, the objects are
re-instantiated from distributed replicas, without their reference holders
being aware of the failure. Furthermore, we believe that this can be achieved
within a spectrum of application programmer intervention, ranging from minimal
to totally prescriptive, as desired. The same mechanisms encompass an
orthogonally persistent programming model. We outline our approach to
implementing this vision, and describe current progress.Comment: Submitted to EuroSys 200
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
CamFlow: Managed Data-sharing for Cloud Services
A model of cloud services is emerging whereby a few trusted providers manage
the underlying hardware and communications whereas many companies build on this
infrastructure to offer higher level, cloud-hosted PaaS services and/or SaaS
applications. From the start, strong isolation between cloud tenants was seen
to be of paramount importance, provided first by virtual machines (VM) and
later by containers, which share the operating system (OS) kernel. Increasingly
it is the case that applications also require facilities to effect isolation
and protection of data managed by those applications. They also require
flexible data sharing with other applications, often across the traditional
cloud-isolation boundaries; for example, when government provides many related
services for its citizens on a common platform. Similar considerations apply to
the end-users of applications. But in particular, the incorporation of cloud
services within `Internet of Things' architectures is driving the requirements
for both protection and cross-application data sharing.
These concerns relate to the management of data. Traditional access control
is application and principal/role specific, applied at policy enforcement
points, after which there is no subsequent control over where data flows; a
crucial issue once data has left its owner's control by cloud-hosted
applications and within cloud-services. Information Flow Control (IFC), in
addition, offers system-wide, end-to-end, flow control based on the properties
of the data. We discuss the potential of cloud-deployed IFC for enforcing
owners' dataflow policy with regard to protection and sharing, as well as
safeguarding against malicious or buggy software. In addition, the audit log
associated with IFC provides transparency, giving configurable system-wide
visibility over data flows. [...]Comment: 14 pages, 8 figure
Invest to Save: Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation
Digital archiving and preservation are important areas for research and development, but there is no agreed upon set of priorities or coherent plan for research in this area. Research projects in this area tend to be small and driven by particular institutional problems or concerns. As a consequence, proposed solutions from experimental projects and prototypes tend not to scale to millions of digital objects, nor do the results from disparate projects readily build on each other. It is also unclear whether it is worthwhile to seek general solutions or whether different strategies are needed for different types of digital objects and collections. The lack of coordination in both research and development means that there are some areas where researchers are reinventing the wheel while other areas are neglected.
Digital archiving and preservation is an area that will benefit from an exercise in analysis, priority setting, and planning for future research. The WG aims to survey current research activities, identify gaps, and develop a white paper proposing future research directions in the area of digital preservation. Some of the potential areas for research include repository architectures and inter-operability among digital archives; automated tools for capture, ingest, and normalization of digital objects; and harmonization of preservation formats and metadata. There can also be opportunities for development of commercial products in the areas of mass storage systems, repositories and repository management systems, and data management software and tools.
Pattern Reification as the Basis for Description-Driven Systems
One of the main factors driving object-oriented software development for
information systems is the requirement for systems to be tolerant to change. To
address this issue in designing systems, this paper proposes a pattern-based,
object-oriented, description-driven system (DDS) architecture as an extension
to the standard UML four-layer meta-model. A DDS architecture is proposed in
which aspects of both static and dynamic systems behavior can be captured via
descriptive models and meta-models. The proposed architecture embodies four
main elements - firstly, the adoption of a multi-layered meta-modeling
architecture and reflective meta-level architecture, secondly the
identification of four data modeling relationships that can be made explicit
such that they can be modified dynamically, thirdly the identification of five
design patterns which have emerged from practice and have proved essential in
providing reusable building blocks for data management, and fourthly the
encoding of the structural properties of the five design patterns by means of
one fundamental pattern, the Graph pattern. A practical example of this
philosophy, the CRISTAL project, is used to demonstrate the use of
description-driven data objects to handle system evolution.Comment: 20 pages, 10 figure
A Robust Fault-Tolerant and Scalable Cluster-wide Deduplication for Shared-Nothing Storage Systems
Deduplication has been largely employed in distributed storage systems to
improve space efficiency. Traditional deduplication research ignores the design
specifications of shared-nothing distributed storage systems such as no central
metadata bottleneck, scalability, and storage rebalancing. Further,
deduplication introduces transactional changes, which are prone to errors in
the event of a system failure, resulting in inconsistencies in data and
deduplication metadata. In this paper, we propose a robust, fault-tolerant and
scalable cluster-wide deduplication that can eliminate duplicate copies across
the cluster. We design a distributed deduplication metadata shard which
guarantees performance scalability while preserving the design constraints of
shared- nothing storage systems. The placement of chunks and deduplication
metadata is made cluster-wide based on the content fingerprint of chunks. To
ensure transactional consistency and garbage identification, we employ a
flag-based asynchronous consistency mechanism. We implement the proposed
deduplication on Ceph. The evaluation shows high disk-space savings with
minimal performance degradation as well as high robustness in the event of
sudden server failure.Comment: 6 Pages including reference
- âŚ