124 research outputs found

    Using Intelligent Prefetching to Reduce the Energy Consumption of a Large-scale Storage System

    Get PDF
    Many high performance large-scale storage systems will experience significant workload increases as their user base and content availability grow over time. The U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) center hosts one such system that has recently undergone a period of rapid growth as its user population grew nearly 400% in just about three years. When administrators of these massive storage systems face the challenge of meeting the demands of an ever increasing number of requests, the easiest solution is to integrate more advanced hardware to existing systems. However, additional investment in hardware may significantly increase the system cost as well as daily power consumption. In this paper, we present evidence that well-selected software level optimization is capable of achieving comparable levels of performance without the cost and power consumption overhead caused by physically expanding the system. Specifically, we develop intelligent prefetching algorithms that are suitable for the unique workloads and user behaviors of the world\u27s largest satellite images distribution system managed by USGS EROS. Our experimental results, derived from real-world traces with over five million requests sent by users around the globe, show that the EROS hybrid storage system could maintain the same performance with over 30% of energy savings by utilizing our proposed prefetching algorithms, compared to the alternative solution of doubling the size of the current FTP server farm

    Sharing a conceptual model of grid resources and services

    Full text link
    Grid technologies aim at enabling a coordinated resource-sharing and problem-solving capabilities over local and wide area networks and span locations, organizations, machine architectures and software boundaries. The heterogeneity of involved resources and the need for interoperability among different grid middlewares require the sharing of a common information model. Abstractions of different flavors of resources and services and conceptual schemas of domain specific entities require a collaboration effort in order to enable a coherent information services cooperation. With this paper, we present the result of our experience in grid resources and services modelling carried out within the Grid Laboratory Uniform Environment (GLUE) effort, a joint US and EU High Energy Physics projects collaboration towards grid interoperability. The first implementation-neutral agreement on services such as batch computing and storage manager, resources such as the hierarchy cluster, sub-cluster, host and the storage library are presented. Design guidelines and operational results are depicted together with open issues and future evolutions.Comment: 4 pages, 0 figures, CHEP 200

    Light database encryption design utilizing multicore processors for mobile devices

    Get PDF
    The confidentiality of data stored in embedded and handheld devices has become an urgent necessity more than ever before. Encryption of sensitive data is a well-known technique to preserve their confidentiality, however it comes with certain costs that can heavily impact the device processing resources. Utilizing multicore processors, which are equipped with current embedded devices, has brought a new era to enhance data confidentiality while maintaining suitable device performance. Encrypting the complete storage area, also known as Full Disk Encryption (FDE) can still be challenging, especially with newly emerging massive storage systems. Alternatively, since the most user sensitive data are residing inside persisting databases, it will be more efficient to focus on securing SQLite databases, through encryption, where SQLite is the most common RDBMS in handheld and embedded systems. This paper addresses the problem of ensuring data protection in embedded and mobile devices while maintaining suitable device performance by mitigating the impact of encryption. We presented here a proposed design for a parallel database encryption system, called SQLite-XTS. The proposed system encrypts data stored in databases transparently on-the-fly without the need for any user intervention. To maintain a proper device performance, the system takes advantage of the commodity multicore processors available with most embedded and mobile devices

    HFR Code: A Flexible Replication Scheme for Cloud Storage Systems

    Full text link
    Fractional repetition (FR) codes are a family of repair-efficient storage codes that provide exact and uncoded node repair at the minimum bandwidth regenerating point. The advantageous repair properties are achieved by a tailor-made two-layer encoding scheme which concatenates an outer maximum-distance-separable (MDS) code and an inner repetition code. In this paper, we generalize the application of FR codes and propose heterogeneous fractional repetition (HFR) code, which is adaptable to the scenario where the repetition degrees of coded packets are different. We provide explicit code constructions by utilizing group divisible designs, which allow the design of HFR codes over a large range of parameters. The constructed codes achieve the system storage capacity under random access repair and have multiple repair alternatives for node failures. Further, we take advantage of the systematic feature of MDS codes and present a novel design framework of HFR codes, in which storage nodes can be wisely partitioned into clusters such that data reconstruction time can be reduced when contacting nodes in the same cluster.Comment: Accepted for publication in IET Communications, Jul. 201

    HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing

    Get PDF
    The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements. The project involves the following components: 1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies

    Parallel detrended fluctuation analysis for fast event detection on massive PMU data

    Get PDF
    ("(c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")Phasor measurement units (PMUs) are being rapidly deployed in power grids due to their high sampling rates and synchronized measurements. The devices high data reporting rates present major computational challenges in the requirement to process potentially massive volumes of data, in addition to new issues surrounding data storage. Fast algorithms capable of processing massive volumes of data are now required in the field of power systems. This paper presents a novel parallel detrended fluctuation analysis (PDFA) approach for fast event detection on massive volumes of PMU data, taking advantage of a cluster computing platform. The PDFA algorithm is evaluated using data from installed PMUs on the transmission system of Great Britain from the aspects of speedup, scalability, and accuracy. The speedup of the PDFA in computation is initially analyzed through Amdahl's Law. A revision to the law is then proposed, suggesting enhancements to its capability to analyze the performance gain in computation when parallelizing data intensive applications in a cluster computing environment

    Robo-line storage: Low latency, high capacity storage systems over geographically distributed networks

    Get PDF
    Rapid advances in high performance computing are making possible more complete and accurate computer-based modeling of complex physical phenomena, such as weather front interactions, dynamics of chemical reactions, numerical aerodynamic analysis of airframes, and ocean-land-atmosphere interactions. Many of these 'grand challenge' applications are as demanding of the underlying storage system, in terms of their capacity and bandwidth requirements, as they are on the computational power of the processor. A global view of the Earth's ocean chlorophyll and land vegetation requires over 2 terabytes of raw satellite image data. In this paper, we describe our planned research program in high capacity, high bandwidth storage systems. The project has four overall goals. First, we will examine new methods for high capacity storage systems, made possible by low cost, small form factor magnetic and optical tape systems. Second, access to the storage system will be low latency and high bandwidth. To achieve this, we must interleave data transfer at all levels of the storage system, including devices, controllers, servers, and communications links. Latency will be reduced by extensive caching throughout the storage hierarchy. Third, we will provide effective management of a storage hierarchy, extending the techniques already developed for the Log Structured File System. Finally, we will construct a protototype high capacity file server, suitable for use on the National Research and Education Network (NREN). Such research must be a Cornerstone of any coherent program in high performance computing and communications

    Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory

    Full text link
    HPC applications pose high demands on I/O performance and storage capability. The emerging non-volatile memory (NVM) techniques offer low-latency, high bandwidth, and persistence for HPC applications. However, the existing I/O stack are designed and optimized based on an assumption of disk-based storage. To effectively use NVM, we must re-examine the existing high performance computing (HPC) I/O sub-system to properly integrate NVM into it. Using NVM as a fast storage, the previous assumption on the inferior performance of storage (e.g., hard drive) is not valid any more. The performance problem caused by slow storage may be mitigated; the existing mechanisms to narrow the performance gap between storage and CPU may be unnecessary and result in large overhead. Thus fully understanding the impact of introducing NVM into the HPC software stack demands a thorough performance study. In this paper, we analyze and model the performance of I/O intensive HPC applications with NVM as a block device. We study the performance from three perspectives: (1) the impact of NVM on the performance of traditional page cache; (2) a performance comparison between MPI individual I/O and POSIX I/O; and (3) the impact of NVM on the performance of collective I/O. We reveal the diminishing effects of page cache, minor performance difference between MPI individual I/O and POSIX I/O, and performance disadvantage of collective I/O on NVM due to unnecessary data shuffling. We also model the performance of MPI collective I/O and study the complex interaction between data shuffling, storage performance, and I/O access patterns.Comment: 10 page

    Data Serving Climate Simulation Science at the NASA Center for Climate Simulation

    Get PDF
    The NASA Center for Climate Simulation (NCCS) provides high performance computational resources, a multi-petabyte archive, and data services in support of climate simulation research and other NASA-sponsored science. This talk describes the NCCS's data-centric architecture and processing, which are evolving in anticipation of researchers' growing requirements for higher resolution simulations and increased data sharing among NCCS users and the external science community
    corecore