Search CORE

960 research outputs found

Cold Storage Data Archives: More Than Just a Bunch of Tapes

Author: Appuswamy Raja
Memishi Bunjamin
Paradies Marcus
Publication venue
Publication date: 01/01/2019
Field of study

The abundance of available sensor and derived data from large scientific experiments, such as earth observation programs, radio astronomy sky surveys, and high-energy physics already exceeds the storage hardware globally fabricated per year. To that end, cold storage data archives are the---often overlooked---spearheads of modern big data analytics in scientific, data-intensive application domains. While high-performance data analytics has received much attention from the research community, the growing number of problems in designing and deploying cold storage archives has only received very little attention. In this paper, we take the first step towards bridging this gap in knowledge by presenting an analysis of four real-world cold storage archives from three different application domains. In doing so, we highlight (i) workload characteristics that differentiate these archives from traditional, performance-sensitive data analytics, (ii) design trade-offs involved in building cold storage systems for these archives, and (iii) deployment trade-offs with respect to migration to the public cloud. Based on our analysis, we discuss several other important research challenges that need to be addressed by the data management community

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments

Author: Estrada-Galiñanes Vero
Felber Pascal
Miller Ethan
Pâris Jehan-François
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/10/2018
Field of study

Data centres that use consumer-grade disks drives and distributed peer-to-peer systems are unreliable environments to archive data without enough redundancy. Most redundancy schemes are not completely effective for providing high availability, durability and integrity in the long-term. We propose alpha entanglement codes, a mechanism that creates a virtual layer of highly interconnected storage devices to propagate redundant information across a large scale storage system. Our motivation is to design flexible and practical erasure codes with high fault-tolerance to improve data durability and availability even in catastrophic scenarios. By flexible and practical, we mean code settings that can be adapted to future requirements and practical implementations with reasonable trade-offs between security, resource usage and performance. The codes have three parameters. Alpha increases storage overhead linearly but increases the possible paths to recover data exponentially. Two other parameters increase fault-tolerance even further without the need of additional storage. As a result, an entangled storage system can provide high availability, durability and offer additional integrity: it is more difficult to modify data undetectably. We evaluate how several redundancy schemes perform in unreliable environments and show that alpha entanglement codes are flexible and practical codes. Remarkably, they excel at code locality, hence, they reduce repair costs and become less dependent on storage locations with poor availability. Our solution outperforms Reed-Solomon codes in many disaster recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN

arXiv.org e-Print Archive

Crossref

A Guide to Distributed Digital Preservation

Author: Schultz Matt
Skinner Katherine
Publication venue: Educopia Institute
Publication date: 01/01/2010
Field of study

This volume is devoted to the broad topic of distributed digital preservation, a still-emerging field of practice for the cultural memory arena. Replication and distribution hold out the promise of indefinite preservation of materials without degradation, but establishing effective organizational and technical processes to enable this form of digital preservation is daunting. Institutions need practical examples of how this task can be accomplished in manageable, low-cost ways."--P. [4] of cove

Boston University Institutional Repository (OpenBU)

Open-Source ANSS Quake Monitoring System Software

Author: Bhadha Rayomand
Bodin Paul
Friberg Paul A.
Hartog J. Renate
Kress Victor C.
Publication venue: 'Seismological Society of America (SSA)'
Publication date: 01/03/2020
Field of study

ANSS stands for the Advanced National Seismic System of the U.S.A., and ANSS Quake Monitoring System (AQMS) is the earthquake management system (EMS) that most of its member regional seismic networks (RSNs) use. AQMS is based on Earthworm, but instead of storing files on disk, it uses a relational database with replication capability to store pick, amplitude, waveform, and event parameters. The replicated database and other features of AQMS make it a fully redundant system. A graphical user interface written in Java, Jiggle, is used to review automatically generated picks and event solutions, relocate events, and recalculate magnitudes. Add‐on mechanisms to produce various postearthquake products such as ShakeMaps and focal mechanisms are available as well. It provides a configurable automatic alarming and notification system. The Pacific Northwest Seismic Network, one of the Tier 1 ANSS RSNs, has modified AQMS to be compatible with a freely available, capable, open‐source database system, PostgreSQL, and is running this version successfully in production. The AQMS Software Working Group has moved the software from a subversion repository server hosted at the California Institute of Technology to a public repository at gitlab.com. The drawback of AQMS as a whole is that it is complex to fully configure and comprehend. Nevertheless, the fact that it is very capable, documented, and now free to use, might make it an attractive EMS choice for many seismic networks

Caltech Authors

Euclid's US Science Data Center: lessons learned from building a small part of a big system

Author: Aussel Hervé
Dabin Christophe
Helou George
Holliman Mark
Rusholme Benjamin
Teplitz Harry I.
Zacchei Andrea
Publication venue: Society of Photo-Optical Instrumentation Engineers (SPIE)
Publication date: 13/12/2020
Field of study

Euclid is an ESA M-class mission to study the geometry and nature of the dark universe, slated for launch in mid-2022. NASA is participating in the mission through the contribution of the near-infrared detectors and associated electronics, the nomination of scientists for membership in the Euclid Consortium, and by establishing the Euclid NASA Science Center at IPAC (ENSCI) to support the US community. As part of ENSCI’s work, we will participate in the Euclid Science Ground Segment (SGS) and build and operate the US Science Data Center (SDC-US), which will be a node in the distributed data processing system for the mission. SDC-US is one of 10 data centers, and will contribute about 5% of the computing and data storage for the distributed system. We discuss lessons learned in developing a node in a distributed system. For example, there is a significant advantage to SDC-US development in sharing of knowledge, problem solving, and resource burden with other parts of the system. On the other hand, fitting into a system that is distributed geographically and relies on diverse computing environments results in added complexity in constructing SDC-US

Purple Computational Environment With Mappings to ACE Requirements for the General Availability User Environment Capabilities

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Storing and manipulating environmental big data with JASMIN

Author: Bennett V. L.
Churchill J.
Juckes M.
Kershaw P.
Lawrence B. N.
Pascoe S.
Pepler S.
Priitchard M.
Stephens A.
Publication venue
Publication date: 06/10/2013
Field of study

JASMIN is a super-data-cluster designed to provide a high-performance high-volume data analysis environment for the UK environmental science community. Thus far JASMIN has been used primarily by the atmospheric science and earth observation communities, both to support their direct scientific workflow, and the curation of data products in the STFC Centre for Environmental Data Archival (CEDA). Initial JASMIN configuration and first experiences are reported here. Useful improvements in scientific workflow are presented. It is clear from the explosive growth in stored data and use that there was a pent up demand for a suitable big-data analysis environment. This demand is not yet satisfied, in part because JASMIN does not yet have enough compute, the storage is fully allocated, and not all software needs are met. Plans to address these constraints are introduced

Central Archive at the University of Reading

Recommended from our members

SOAR (Support Office for Aerogeophysical Research) Annual Report 1995/1996

Author: Bell Robin E.
Blankenship Donald D.
Richter Thomas G.
Williams J.L.
Publication venue: Institute for Geophysics
Publication date: 01/01/1995
Field of study

The Support Office for Aerogeophysical Research (SOAR) was a facility of the National Science Foundation's Office of Polar Programs whose mission is to make airborne geophysical observations available to the broad research community of geology, glaciology and other sciences. The central office of the SOAR facility is located in Austin, Texas within the University of Texas Institute for Geophysics. Other institutions with significant responsibilities are the Lamont Doherty Earth Observatory of Columbia University and the Geophysics Branch of the U.S . Geological Survey. This report summarizes the goals and accomplishments of the SOAR facility during 1995/1996 and plans for the next year.National Science Foundation's Office of Polar ProgramsInstitute for Geophysic

Texas ScholarWorks

Preserving Our Collections, Preserving Our Missions

Author: Halbert Martin
NC DOCKS at The University of North Carolina at Greensboro
Publication venue
Publication date: 01/01/2010
Field of study

A Guide to Distributed Digital Preservation is intentionally structured such that every chaptercan stand on its own or be paired with other segments of the book at will, allowing readers topick their own pathway through the guide as best suits their needs. This approach hasnecessitated that the authors and editors include some level of repetition of basic principlesacross chapters, and has also made the Glossary (included at the back of this guide) an essentialreference resource for all readers.This guide is written with a broad audience in mind that includes librarians, curators, archivists,scholars, technologists, lawyers, and administrators. Any resourceful reader should be able to usethis guide to gain both a philosophical and practical understanding of the emerging field ofdistributed digital preservation (DDP), including how to establish or join a Private LOCKSSNetwork (PLN)

The University of North Carolina at Greensboro