34,069 research outputs found
Robo-line storage: Low latency, high capacity storage systems over geographically distributed networks
Rapid advances in high performance computing are making possible more complete and accurate computer-based modeling of complex physical phenomena, such as weather front interactions, dynamics of chemical reactions, numerical aerodynamic analysis of airframes, and ocean-land-atmosphere interactions. Many of these 'grand challenge' applications are as demanding of the underlying storage system, in terms of their capacity and bandwidth requirements, as they are on the computational power of the processor. A global view of the Earth's ocean chlorophyll and land vegetation requires over 2 terabytes of raw satellite image data. In this paper, we describe our planned research program in high capacity, high bandwidth storage systems. The project has four overall goals. First, we will examine new methods for high capacity storage systems, made possible by low cost, small form factor magnetic and optical tape systems. Second, access to the storage system will be low latency and high bandwidth. To achieve this, we must interleave data transfer at all levels of the storage system, including devices, controllers, servers, and communications links. Latency will be reduced by extensive caching throughout the storage hierarchy. Third, we will provide effective management of a storage hierarchy, extending the techniques already developed for the Log Structured File System. Finally, we will construct a protototype high capacity file server, suitable for use on the National Research and Education Network (NREN). Such research must be a Cornerstone of any coherent program in high performance computing and communications
Community standards for open cell migration data
Cell migration research has become a high-content field. However, the quantitative information encapsulated in these complex and high-dimensional datasets is not fully exploited owing to the diversity of experimental protocols and non-standardized output formats. In addition, typically the datasets are not open for reuse. Making the data open and Findable, Accessible, Interoperable, and Reusable (FAIR) will enable meta-analysis, data integration, and data mining. Standardized data formats and controlled vocabularies are essential for building a suitable infrastructure for that purpose but are not available in the cell migration domain. We here present standardization efforts by the Cell Migration Standardisation Organisation (CMSO), an open community-driven organization to facilitate the development of standards for cell migration data. This work will foster the development of improved algorithms and tools and enable secondary analysis of public datasets, ultimately unlocking new knowledge of the complex biological process of cell migration
Cold Storage Data Archives: More Than Just a Bunch of Tapes
The abundance of available sensor and derived data from large scientific
experiments, such as earth observation programs, radio astronomy sky surveys,
and high-energy physics already exceeds the storage hardware globally
fabricated per year. To that end, cold storage data archives are the---often
overlooked---spearheads of modern big data analytics in scientific,
data-intensive application domains. While high-performance data analytics has
received much attention from the research community, the growing number of
problems in designing and deploying cold storage archives has only received
very little attention.
In this paper, we take the first step towards bridging this gap in knowledge
by presenting an analysis of four real-world cold storage archives from three
different application domains. In doing so, we highlight (i) workload
characteristics that differentiate these archives from traditional,
performance-sensitive data analytics, (ii) design trade-offs involved in
building cold storage systems for these archives, and (iii) deployment
trade-offs with respect to migration to the public cloud. Based on our
analysis, we discuss several other important research challenges that need to
be addressed by the data management community
Data management in NOAA
NOAA has 11 terabytes of digital data stored on 240,000 computer tapes. There are an additional 100 terabytes (TB) of geostationary satellite data stored in digital form on specially configured SONY U-Matic video tapes at the University of Wisconsin. There are over 90,000,000 non-digital form records in manuscript, film, printed, and chart form which are not easily accessible. The three NOAA Data Centers service 6,000 requests per year and publish 5,000 bulletins which are distributed to 40,000 subscribers. Seventeen CD-ROM's have been produced. Thirty thousand computer tapes containing polar satellite data are being copied to 12 inch WORM optical disks for research applications. The present annual data accumulation rate of 10 TB will grow to 30 TB in 1994 and to 100 TB by the year 2000. The present storage and distribution technologies with their attendant support systems will be overwhelmed by these increases if not improved. Increased user sophistication coupled with more precise measurement technologies will demand better quality control mechanisms, especially for those data maintained in an indefinite archive. There is optimism that the future will offer improved media technologies to accommodate the volumes of data. With the advanced technologies, storage and performance monitoring tools will be pivotal to the successful long-term management of data and information
BlogForever: D3.1 Preservation Strategy Report
This report describes preservation planning approaches and strategies recommended by the BlogForever project as a core component of a weblog repository design. More specifically, we start by discussing why we would want to preserve weblogs in the first place and what it is exactly that we are trying to preserve. We further present a review of past and present work and highlight why current practices in web archiving do not address the needs of weblog preservation adequately. We make three distinctive contributions in this volume: a) we propose transferable practical workflows for applying a combination of established metadata and repository standards in developing a weblog repository, b) we provide an automated approach to identifying significant properties of weblog content that uses the notion of communities and how this affects previous strategies, c) we propose a sustainability plan that draws upon community knowledge through innovative repository design
MoPS: A Modular Protection Scheme for Long-Term Storage
Current trends in technology, such as cloud computing, allow outsourcing the
storage, backup, and archiving of data. This provides efficiency and
flexibility, but also poses new risks for data security. It in particular
became crucial to develop protection schemes that ensure security even in the
long-term, i.e. beyond the lifetime of keys, certificates, and cryptographic
primitives. However, all current solutions fail to provide optimal performance
for different application scenarios. Thus, in this work, we present MoPS, a
modular protection scheme to ensure authenticity and integrity for data stored
over long periods of time. MoPS does not come with any requirements regarding
the storage architecture and can therefore be used together with existing
archiving or storage systems. It supports a set of techniques which can be
plugged together, combined, and migrated in order to create customized
solutions that fulfill the requirements of different application scenarios in
the best possible way. As a proof of concept we implemented MoPS and provide
performance measurements. Furthermore, our implementation provides additional
features, such as guidance for non-expert users and export functionalities for
external verifiers.Comment: Original Publication (in the same form): ASIACCS 201
- …