17,579 research outputs found

    Transformative Effects of NDIIPP, the Case of the Henry A. Murray Archive

    Get PDF
    This article comprises reflections on the changes to the Henry A. Murray Research Archive, catalyzed by involvement with the National Digital Information Infrastructure and Preservation Program (NDIIPP) partnership, and the accompanying introduction of next generation digital library software. Founded in 1976 at Radcliffe, the Henry A. Murray Research Archive is the endowed, permanent repository for quantitative and qualitative research data at the Institute for Quantitative Social Science, in Harvard University. The Murray preserves in perpetuity all types of data of interest to the research community, including numerical, video, audio, interview notes, and other types. The center is unique among data archives in the United States in the extent of its holdings in quantitative, qualitative, and mixed quantitativequalitative research. The Murray took part in an NDIIPP-funded collaboration with four other archival partners, Data-PASS, for the purpose of the identification and acquisition of data at risk, and the joint development of best practices with respect to shared stewardship, preservation, and exchange of these data. During this time, the Dataverse Network (DVN) software was introduced, facilitating the creation of virtual archives. The combination of institutional collaboration and new technology lead the Murray to re-engineer its entire acquisition process; completely rewrite its ingest, dissemination, and other licensing agreements; and adopt a new model for ingest, discovery, access, and presentation of its collections. Through the Data-PASS project, the Murray has acquired a number of important data collections. The resulting changes within the Murray have been dramatic, including increasing its overall rate of acquisitions by fourfold; and disseminating acquisitions far more rapidly. Furthermore, the new licensing and processing procedures allow a previously undreamed of level of interoperability and collaboration with partner archives, facilitating integrated discovery and presentation services, and joint stewardship of collections.published or submitted for publicatio

    Middleware-based Database Replication: The Gaps between Theory and Practice

    Get PDF
    The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, June 200

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    A Scalable Middleware Solution for Advanced Wide Area Web Services

    Get PDF
    To alleviate scalability problems in the Web, many researchers concentrate on how to incorporate advanced caching and replication techniques. Many solutions incorporate object-based techniques. In particular, Web resources are considered as distributed objects offering a well-defined interface. We argue that most proposals ignore two important aspects. First, there is little discussion on what kind of coherence should be provided. Proposing specific caching or replication solutions makes sense only if we know what coherence model they should implement. Second, most proposals treat all Web resources alike. Such a one-size-fits-all approach will never work in a wide-area system. We propose a solution in which Web resources are encapsulated in physically distributed shared objects. Each object should encapsulate not only state and operations, but also the policy by which its state is distributed, cached, replicated, migrated, etc

    Archiving the Relaxed Consistency Web

    Full text link
    The historical, cultural, and intellectual importance of archiving the web has been widely recognized. Today, all countries with high Internet penetration rate have established high-profile archiving initiatives to crawl and archive the fast-disappearing web content for long-term use. As web technologies evolve, established web archiving techniques face challenges. This paper focuses on the potential impact of the relaxed consistency web design on crawler driven web archiving. Relaxed consistent websites may disseminate, albeit ephemerally, inaccurate and even contradictory information. If captured and preserved in the web archives as historical records, such information will degrade the overall archival quality. To assess the extent of such quality degradation, we build a simplified feed-following application and simulate its operation with synthetic workloads. The results indicate that a non-trivial portion of a relaxed consistency web archive may contain observable inconsistency, and the inconsistency window may extend significantly longer than that observed at the data store. We discuss the nature of such quality degradation and propose a few possible remedies.Comment: 10 pages, 6 figures, CIKM 201
    • …
    corecore