54 research outputs found
Scalable Storage for Digital Libraries
I propose a storage system optimised for digital libraries. Its key features are its heterogeneous scalability; its integration and exploitation of rich semantic metadata associated with digital objects; its use of a name space; and its aggressive performance optimisation in the digital library domain
Designing Reliable High-Performance Storage Systems for HPC Environments
Advances in processing capability have far outpaced advances in I/O throughput and latency. Distributed file system based storage systems help to address this performance discrepancy in high performance computing (HPC) environments; however, they can be difficult to deploy and challenging to maintain. This thesis explores the design considerations as well as the pitfalls faced when deploying high performance storage systems. It includes best practices in identifying system requirements, techniques for generating I/O profiles of applications, and recommendations for disk subsystem configuration and maintenance based upon a number of recent papers addressing latent sector and unrecoverable read errors
An architecture for an ATM network continuous media server exploiting temporal locality of access
With the continuing drop in the price of memory, Video-on-Demand (VoD) solutions that have so far focused on maximising the throughput of disk units with a minimal use of physical memory may now employ significant amounts of cache memory. The subject of this thesis is the study of a technique to best utilise a memory buffer within such a VoD solution. In particular, knowledge of the streams active on the server is used to allocate cache memory. Stream optimised caching exploits reuse of data among streams that are temporally close to each other within the same clip; the data fetched on behalf of the leading stream may be cached and reused by the following streams. Therefore, only the leading stream requires access to the physical disk and the potential level of service provision allowed by the server may be increased. The use of stream optimised caching may consequently be limited to environments where reuse of data is significant. As such, the technique examined within this thesis focuses on a classroom environment where user progress is generally linear and all users progress at approximately the same rate for such an environment, reuse of data is guaranteed. The analysis of stream optimised caching begins with a detailed theoretical discussion of the technique and suggests possible implementations. Later chapters describe both the design and construction of a prototype server that employs the caching technique, and experiments that use of the prototype to assess the effectiveness of the technique for the chosen environment using `emulated' users. The conclusions of these experiments indicate that stream optimised caching may be applicable to larger scale VoD systems than small scale teaching environments. Future development of stream optimised caching is considered
Fourth NASA Goddard Conference on Mass Storage Systems and Technologies
This report contains copies of all those technical papers received in time for publication just prior to the Fourth Goddard Conference on Mass Storage and Technologies, held March 28-30, 1995, at the University of Maryland, University College Conference Center, in College Park, Maryland. This series of conferences continues to serve as a unique medium for the exchange of information on topics relating to the ingestion and management of substantial amounts of data and the attendant problems involved. This year's discussion topics include new storage technology, stability of recorded media, performance studies, storage system solutions, the National Information infrastructure (Infobahn), the future for storage technology, and lessons learned from various projects. There also will be an update on the IEEE Mass Storage System Reference Model Version 5, on which the final vote was taken in July 1994
A basic framework and overview of a network-based RAID-like distributed back-up system : NetRAID
NetRAID is a framework for a simple, open, and free system to allow end-users to have the capacity to create a geographically distributed, secure, redundant system that will provide end-users with the capacity to back up important data. NetRAID is designed to be lightweight, cross-platform, low cost, extendable, and simple. As more important data becomes digitalized it is critical for even average home computer users to be able to ensure that their data is secure. Even for people with DVD burners that back up their data weekly, if the back ups and their sources are kept in the same physical location the value of the back-up is greatly diminished. NetRAID can offer a more comprehensive end-user back-up. NetRAID version 1 has some limitations with the types and speeds of networks it can run on; however, it provides a building block for the future extension to almost any sort of TCP/IP network. NetRAID also has the potential capability to use a wide variety of encryption and data verification schemes to make sure that data is secure in transmission and storage. The NetRAID virtual file system, sockets, and program core are written in Visual Basic.NET 2003, and should be portable to a wide variety of operating systems and languages in the future
Big Data Analytics on Traditional HPC Infrastructure Using Two-Level Storage
Data-intensive computing has become one of the major workloads on traditional
high-performance computing (HPC) clusters. Currently, deploying data-intensive
computing software framework on HPC clusters still faces performance and
scalability issues. In this paper, we develop a new two-level storage system by
integrating Tachyon, an in-memory file system with OrangeFS, a parallel file
system. We model the I/O throughputs of four storage structures: HDFS,
OrangeFS, Tachyon and two-level storage. We conduct computational experiments
to characterize I/O throughput behavior of two-level storage and compare its
performance to that of HDFS and OrangeFS, using TeraSort benchmark. Theoretical
models and experimental tests both show that the two-level storage system can
increase the aggregate I/O throughputs. This work lays a solid foundation for
future work in designing and building HPC systems that can provide a better
support on I/O intensive workloads with preserving existing computing
resources.Comment: Submitted to SC15, 8 pages, 7 figures, 3 table
Redundant disk arrays: Reliable, parallel secondary storage
During the past decade, advances in processor and memory technology have given rise to increases in computational performance that far outstrip increases in the performance of secondary storage technology. Coupled with emerging small-disk technology, disk arrays provide the cost, volume, and capacity of current disk subsystems, by leveraging parallelism, many times their performance. Unfortunately, arrays of small disks may have much higher failure rates than the single large disks they replace. Redundant arrays of inexpensive disks (RAID) use simple redundancy schemes to provide high data reliability. The data encoding, performance, and reliability of redundant disk arrays are investigated. Organizing redundant data into a disk array is treated as a coding problem. Among alternatives examined, codes as simple as parity are shown to effectively correct single, self-identifying disk failures
Flexible allocation and space management in storage systems
In this dissertation, we examine some of the challenges faced by the emerging
networked storage systems. We focus on two main issues. Current file systems allocate
storage statically at the time of their creation. This results in many suboptimal
scenarios, for example: (a) space on the disk is not allocated well across multiple
file systems, (b) data is not organized well for typical access patterns. We propose
Virtual Allocation for flexible storage allocation. Virtual allocation separates storage
allocation from the file system. It employs an allocate-on-write strategy, which lets
applications fit into the actual usage of storage space without regard to the configured
file system size. This improves flexibility by allowing storage space to be shared across
different file systems. We present the design of virtual allocation and an evaluation
of it through benchmarks based on a prototype system on Linux.
Next, based on virtual allocation, we consider the problem of balancing locality and load in networked storage systems with multiple storage devices (or bricks).
Data distribution affects locality and load balance across the devices in a networked
storage system. We propose user-optimal data migration scheme which tries to balance locality and load balance in such networked storage systems. The presented
approach automatically and transparently manages migration of data blocks among
disks as data access patterns and loads change over time. We built a prototype system on Linux and present the design of user-optimal migration and an evaluation of
it through realistic experiments
- …