2,501 research outputs found
Optimizations for Energy-Aware, High-Performance and Reliable Distributed Storage Systems
With the decreasing cost and wide-spread use of commodity hard drives, it has become possible to create very large-scale storage systems with less expense. However, as we approach exabyte-scale storage systems, maintaining important features such as energy-efficiency, performance, reliability and usability became increasingly difficult. Despite the decreasing cost of storage systems, the energy consumption of these systems still needs to be addressed in order to retain cost-effectiveness. Any improvements in a storage system can be outweighed by high energy costs. On the other hand, large-scale storage systems can benefit more from the object storage features for improved performance and usability. One area of concern is metadata performance bottleneck of applications reading large directories or creating a large number of files. Similarly, computation on big data where data needs to be transferred between compute and storage clusters adversely affects I/O performance. As the storage systems become more complex and larger, transferring data between remote compute and storage tiers becomes impractical. Furthermore, storage systems implement reliability typically at the file system or client level. This approach might not always be practical in terms of performance. Lastly, object storage features are usually tailored to specific use cases that makes it harder to use them in various contexts.
In this thesis, we are presenting several approaches to enhance energy-efficiency, performance, reliability and usability of large-scale storage systems. To begin with, we improve the energy-efficiency of storage systems by moving I/O load to a subset of the storage nodes with energy-aware node allocation methods and turn off the unused nodes, while preserving load balance on demand. To address the metadata performance issue associated with large creates and directory reads, we represent directories with object storage collections and implement lazy creation of objects. Similarly, in-situ computation on large-scale data is enabled by using object storage features to integrate a computational framework with the existing object storage layer to eliminate the need to transfer data between compute and storage silos for better performance. We then present parity-based redundancy using object storage features to achieve reliability with less performance impact. Finally, unified storage brings together the object storage features to meet the needs of distinct use cases; such as cloud storage, big data or high-performance computing to alleviate the unnecessary fragmentation of storage resources. We evaluate each proposed approach thoroughly and validate their effectiveness in terms of improving energy-efficiency, performance, reliability and usability of a large-scale storage system
Wireless Communications in the Era of Big Data
The rapidly growing wave of wireless data service is pushing against the
boundary of our communication network's processing power. The pervasive and
exponentially increasing data traffic present imminent challenges to all the
aspects of the wireless system design, such as spectrum efficiency, computing
capabilities and fronthaul/backhaul link capacity. In this article, we discuss
the challenges and opportunities in the design of scalable wireless systems to
embrace such a "bigdata" era. On one hand, we review the state-of-the-art
networking architectures and signal processing techniques adaptable for
managing the bigdata traffic in wireless networks. On the other hand, instead
of viewing mobile bigdata as a unwanted burden, we introduce methods to
capitalize from the vast data traffic, for building a bigdata-aware wireless
network with better wireless service quality and new mobile applications. We
highlight several promising future research directions for wireless
communications in the mobile bigdata era.Comment: This article is accepted and to appear in IEEE Communications
Magazin
Potential of I/O aware workflows in climate and weather
The efficient, convenient, and robust execution of data-driven workflows and enhanced data
management are essential for productivity in scientific computing. In HPC, the concerns of storage
and computing are traditionally separated and optimised independently from each other and the
needs of the end-to-end user. However, in complex workflows, this is becoming problematic. These
problems are particularly acute in climate and weather workflows, which as well as becoming
increasingly complex and exploiting deep storage hierarchies, can involve multiple data centres.
The key contributions of this paper are: 1) A sketch of a vision for an integrated data-driven
approach, with a discussion of the associated challenges and implications, and 2) An architecture
and roadmap consistent with this vision that would allow a seamless integration into current
climate and weather workflows as it utilises versions of existing tools (ESDM, Cylc, XIOS, and
DDN’s IME).
The vision proposed here is built on the belief that workflows composed of data, computing, and communication-intensive tasks should drive interfaces and hardware configurations to
better support the programming models. When delivered, this work will increase the opportunity for smarter scheduling of computing by considering storage in heterogeneous storage systems.
We illustrate the performance-impact on an example workload using a model built on measured
performance data using ESDM at DKRZ
A Survey on the Integration of NAND Flash Storage in the Design of File Systems and the Host Storage Software Stack
With the ever-increasing amount of data generate in the world, estimated to
reach over 200 Zettabytes by 2025, pressure on efficient data storage systems
is intensifying. The shift from HDD to flash-based SSD provides one of the most
fundamental shifts in storage technology, increasing performance capabilities
significantly. However, flash storage comes with different characteristics than
prior HDD storage technology. Therefore, storage software was unsuitable for
leveraging the capabilities of flash storage. As a result, a plethora of
storage applications have been design to better integrate with flash storage
and align with flash characteristics.
In this literature study we evaluate the effect the introduction of flash
storage has had on the design of file systems, which providing one of the most
essential mechanisms for managing persistent storage. We analyze the mechanisms
for effectively managing flash storage, managing overheads of introduced design
requirements, and leverage the capabilities of flash storage. Numerous methods
have been adopted in file systems, however prominently revolve around similar
design decisions, adhering to the flash hardware constrains, and limiting
software intervention. Future design of storage software remains prominent with
the constant growth in flash-based storage devices and interfaces, providing an
increasing possibility to enhance flash integration in the host storage
software stack
A Survey on the Integration of NAND Flash Storage in the Design of File Systems and the Host Storage Software Stack
With the ever-increasing amount of data generate in the world, estimated to reach over 200 Zettabytes by 2025, pressure on efficient data storage systems is intensifying. The shift from HDD to flash-based SSD provides one of the most fundamental shifts in storage technology, increasing performance capabilities significantly. However, flash storage comes with different characteristics than prior HDD storage technology. Therefore, storage software was unsuitable for leveraging the capabilities of flash storage. As a result, a plethora of storage applications have been design to better integrate with flash storage and align with flash characteristics. In this literature study we evaluate the effect the introduction of flash storage has had on the design of file systems, which providing one of the most essential mechanisms for managing persistent storage. We analyze the mechanisms for effectively managing flash storage, managing overheads of introduced design requirements, and leverage the capabilities of flash storage. Numerous methods have been adopted in file systems, however prominently revolve around similar design decisions, adhering to the flash hardware constrains, and limiting software intervention. Future design of storage software remains prominent with the constant growth in flash-based storage devices and interfaces, providing an increasing possibility to enhance flash integration in the host storage software stack
- …