54 research outputs found

    Scalable Storage for Digital Libraries

    Get PDF
    I propose a storage system optimised for digital libraries. Its key features are its heterogeneous scalability; its integration and exploitation of rich semantic metadata associated with digital objects; its use of a name space; and its aggressive performance optimisation in the digital library domain

    Designing Reliable High-Performance Storage Systems for HPC Environments

    Get PDF
    Advances in processing capability have far outpaced advances in I/O throughput and latency. Distributed file system based storage systems help to address this performance discrepancy in high performance computing (HPC) environments; however, they can be difficult to deploy and challenging to maintain. This thesis explores the design considerations as well as the pitfalls faced when deploying high performance storage systems. It includes best practices in identifying system requirements, techniques for generating I/O profiles of applications, and recommendations for disk subsystem configuration and maintenance based upon a number of recent papers addressing latent sector and unrecoverable read errors

    An architecture for an ATM network continuous media server exploiting temporal locality of access

    Get PDF
    With the continuing drop in the price of memory, Video-on-Demand (VoD) solutions that have so far focused on maximising the throughput of disk units with a minimal use of physical memory may now employ significant amounts of cache memory. The subject of this thesis is the study of a technique to best utilise a memory buffer within such a VoD solution. In particular, knowledge of the streams active on the server is used to allocate cache memory. Stream optimised caching exploits reuse of data among streams that are temporally close to each other within the same clip; the data fetched on behalf of the leading stream may be cached and reused by the following streams. Therefore, only the leading stream requires access to the physical disk and the potential level of service provision allowed by the server may be increased. The use of stream optimised caching may consequently be limited to environments where reuse of data is significant. As such, the technique examined within this thesis focuses on a classroom environment where user progress is generally linear and all users progress at approximately the same rate for such an environment, reuse of data is guaranteed. The analysis of stream optimised caching begins with a detailed theoretical discussion of the technique and suggests possible implementations. Later chapters describe both the design and construction of a prototype server that employs the caching technique, and experiments that use of the prototype to assess the effectiveness of the technique for the chosen environment using `emulated' users. The conclusions of these experiments indicate that stream optimised caching may be applicable to larger scale VoD systems than small scale teaching environments. Future development of stream optimised caching is considered

    Fourth NASA Goddard Conference on Mass Storage Systems and Technologies

    Get PDF
    This report contains copies of all those technical papers received in time for publication just prior to the Fourth Goddard Conference on Mass Storage and Technologies, held March 28-30, 1995, at the University of Maryland, University College Conference Center, in College Park, Maryland. This series of conferences continues to serve as a unique medium for the exchange of information on topics relating to the ingestion and management of substantial amounts of data and the attendant problems involved. This year's discussion topics include new storage technology, stability of recorded media, performance studies, storage system solutions, the National Information infrastructure (Infobahn), the future for storage technology, and lessons learned from various projects. There also will be an update on the IEEE Mass Storage System Reference Model Version 5, on which the final vote was taken in July 1994

    A basic framework and overview of a network-based RAID-like distributed back-up system : NetRAID

    Get PDF
    NetRAID is a framework for a simple, open, and free system to allow end-users to have the capacity to create a geographically distributed, secure, redundant system that will provide end-users with the capacity to back up important data. NetRAID is designed to be lightweight, cross-platform, low cost, extendable, and simple. As more important data becomes digitalized it is critical for even average home computer users to be able to ensure that their data is secure. Even for people with DVD burners that back up their data weekly, if the back ups and their sources are kept in the same physical location the value of the back-up is greatly diminished. NetRAID can offer a more comprehensive end-user back-up. NetRAID version 1 has some limitations with the types and speeds of networks it can run on; however, it provides a building block for the future extension to almost any sort of TCP/IP network. NetRAID also has the potential capability to use a wide variety of encryption and data verification schemes to make sure that data is secure in transmission and storage. The NetRAID virtual file system, sockets, and program core are written in Visual Basic.NET 2003, and should be portable to a wide variety of operating systems and languages in the future

    Big Data Analytics on Traditional HPC Infrastructure Using Two-Level Storage

    Full text link
    Data-intensive computing has become one of the major workloads on traditional high-performance computing (HPC) clusters. Currently, deploying data-intensive computing software framework on HPC clusters still faces performance and scalability issues. In this paper, we develop a new two-level storage system by integrating Tachyon, an in-memory file system with OrangeFS, a parallel file system. We model the I/O throughputs of four storage structures: HDFS, OrangeFS, Tachyon and two-level storage. We conduct computational experiments to characterize I/O throughput behavior of two-level storage and compare its performance to that of HDFS and OrangeFS, using TeraSort benchmark. Theoretical models and experimental tests both show that the two-level storage system can increase the aggregate I/O throughputs. This work lays a solid foundation for future work in designing and building HPC systems that can provide a better support on I/O intensive workloads with preserving existing computing resources.Comment: Submitted to SC15, 8 pages, 7 figures, 3 table

    Redundant disk arrays: Reliable, parallel secondary storage

    Get PDF
    During the past decade, advances in processor and memory technology have given rise to increases in computational performance that far outstrip increases in the performance of secondary storage technology. Coupled with emerging small-disk technology, disk arrays provide the cost, volume, and capacity of current disk subsystems, by leveraging parallelism, many times their performance. Unfortunately, arrays of small disks may have much higher failure rates than the single large disks they replace. Redundant arrays of inexpensive disks (RAID) use simple redundancy schemes to provide high data reliability. The data encoding, performance, and reliability of redundant disk arrays are investigated. Organizing redundant data into a disk array is treated as a coding problem. Among alternatives examined, codes as simple as parity are shown to effectively correct single, self-identifying disk failures

    Flexible allocation and space management in storage systems

    Get PDF
    In this dissertation, we examine some of the challenges faced by the emerging networked storage systems. We focus on two main issues. Current file systems allocate storage statically at the time of their creation. This results in many suboptimal scenarios, for example: (a) space on the disk is not allocated well across multiple file systems, (b) data is not organized well for typical access patterns. We propose Virtual Allocation for flexible storage allocation. Virtual allocation separates storage allocation from the file system. It employs an allocate-on-write strategy, which lets applications fit into the actual usage of storage space without regard to the configured file system size. This improves flexibility by allowing storage space to be shared across different file systems. We present the design of virtual allocation and an evaluation of it through benchmarks based on a prototype system on Linux. Next, based on virtual allocation, we consider the problem of balancing locality and load in networked storage systems with multiple storage devices (or bricks). Data distribution affects locality and load balance across the devices in a networked storage system. We propose user-optimal data migration scheme which tries to balance locality and load balance in such networked storage systems. The presented approach automatically and transparently manages migration of data blocks among disks as data access patterns and loads change over time. We built a prototype system on Linux and present the design of user-optimal migration and an evaluation of it through realistic experiments
    • …
    corecore