17,579 research outputs found
Recommended from our members
The Earth System Grid Federation: software framework supporting CMIP5 data analysis and dissemination
Transformative Effects of NDIIPP, the Case of the Henry A. Murray Archive
This article comprises reflections on the changes to the Henry A.
Murray Research Archive, catalyzed by involvement with the National
Digital Information Infrastructure and Preservation Program
(NDIIPP) partnership, and the accompanying introduction of next
generation digital library software.
Founded in 1976 at Radcliffe, the Henry A. Murray Research
Archive is the endowed, permanent repository for quantitative and
qualitative research data at the Institute for Quantitative Social Science,
in Harvard University. The Murray preserves in perpetuity all
types of data of interest to the research community, including numerical,
video, audio, interview notes, and other types. The center
is unique among data archives in the United States in the extent
of its holdings in quantitative, qualitative, and mixed quantitativequalitative
research.
The Murray took part in an NDIIPP-funded collaboration
with four other archival partners, Data-PASS, for the purpose of
the identification and acquisition of data at risk, and the joint development
of best practices with respect to shared stewardship,
preservation, and exchange of these data. During this time, the
Dataverse Network (DVN) software was introduced, facilitating
the creation of virtual archives. The combination of institutional
collaboration and new technology lead the Murray to re-engineer
its entire acquisition process; completely rewrite its ingest,
dissemination, and other licensing agreements; and adopt a new
model for ingest, discovery, access, and presentation of its collections.
Through the Data-PASS project, the Murray has acquired a
number of important data collections. The resulting changes
within the Murray have been dramatic, including increasing its
overall rate of acquisitions by fourfold; and disseminating acquisitions
far more rapidly. Furthermore, the new licensing and
processing procedures allow a previously undreamed of level of
interoperability and collaboration with partner archives, facilitating
integrated discovery and presentation services, and joint
stewardship of collections.published or submitted for publicatio
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
A Scalable Middleware Solution for Advanced Wide Area Web Services
To alleviate scalability problems in the Web, many researchers concentrate on how to incorporate advanced caching and replication techniques. Many solutions incorporate object-based techniques. In particular, Web resources are considered as distributed objects offering a well-defined interface. We argue that most proposals ignore two important aspects. First, there is little discussion on what kind of coherence should be provided. Proposing specific caching or replication solutions makes sense only if we know what coherence model they should implement. Second, most proposals treat all Web resources alike. Such a one-size-fits-all approach will never work in a wide-area system. We propose a solution in which Web resources are encapsulated in physically distributed shared objects. Each object should encapsulate not only state and operations, but also the policy by which its state is distributed, cached, replicated, migrated, etc
Archiving the Relaxed Consistency Web
The historical, cultural, and intellectual importance of archiving the web
has been widely recognized. Today, all countries with high Internet penetration
rate have established high-profile archiving initiatives to crawl and archive
the fast-disappearing web content for long-term use. As web technologies
evolve, established web archiving techniques face challenges. This paper
focuses on the potential impact of the relaxed consistency web design on
crawler driven web archiving. Relaxed consistent websites may disseminate,
albeit ephemerally, inaccurate and even contradictory information. If captured
and preserved in the web archives as historical records, such information will
degrade the overall archival quality. To assess the extent of such quality
degradation, we build a simplified feed-following application and simulate its
operation with synthetic workloads. The results indicate that a non-trivial
portion of a relaxed consistency web archive may contain observable
inconsistency, and the inconsistency window may extend significantly longer
than that observed at the data store. We discuss the nature of such quality
degradation and propose a few possible remedies.Comment: 10 pages, 6 figures, CIKM 201
- …