2,241 research outputs found

    HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing

    Get PDF
    The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements. The project involves the following components: 1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies

    Optimizations for Energy-Aware, High-Performance and Reliable Distributed Storage Systems

    Get PDF
    With the decreasing cost and wide-spread use of commodity hard drives, it has become possible to create very large-scale storage systems with less expense. However, as we approach exabyte-scale storage systems, maintaining important features such as energy-efficiency, performance, reliability and usability became increasingly difficult. Despite the decreasing cost of storage systems, the energy consumption of these systems still needs to be addressed in order to retain cost-effectiveness. Any improvements in a storage system can be outweighed by high energy costs. On the other hand, large-scale storage systems can benefit more from the object storage features for improved performance and usability. One area of concern is metadata performance bottleneck of applications reading large directories or creating a large number of files. Similarly, computation on big data where data needs to be transferred between compute and storage clusters adversely affects I/O performance. As the storage systems become more complex and larger, transferring data between remote compute and storage tiers becomes impractical. Furthermore, storage systems implement reliability typically at the file system or client level. This approach might not always be practical in terms of performance. Lastly, object storage features are usually tailored to specific use cases that makes it harder to use them in various contexts. In this thesis, we are presenting several approaches to enhance energy-efficiency, performance, reliability and usability of large-scale storage systems. To begin with, we improve the energy-efficiency of storage systems by moving I/O load to a subset of the storage nodes with energy-aware node allocation methods and turn off the unused nodes, while preserving load balance on demand. To address the metadata performance issue associated with large creates and directory reads, we represent directories with object storage collections and implement lazy creation of objects. Similarly, in-situ computation on large-scale data is enabled by using object storage features to integrate a computational framework with the existing object storage layer to eliminate the need to transfer data between compute and storage silos for better performance. We then present parity-based redundancy using object storage features to achieve reliability with less performance impact. Finally, unified storage brings together the object storage features to meet the needs of distinct use cases; such as cloud storage, big data or high-performance computing to alleviate the unnecessary fragmentation of storage resources. We evaluate each proposed approach thoroughly and validate their effectiveness in terms of improving energy-efficiency, performance, reliability and usability of a large-scale storage system

    DINOMO: An Elastic, Scalable, High-Performance Key-Value Store for Disaggregated Persistent Memory (Extended Version)

    Full text link
    We present Dinomo, a novel key-value store for disaggregated persistent memory (DPM). Dinomo is the first key-value store for DPM that simultaneously achieves high common-case performance, scalability, and lightweight online reconfiguration. We observe that previously proposed key-value stores for DPM had architectural limitations that prevent them from achieving all three goals simultaneously. Dinomo uses a novel combination of techniques such as ownership partitioning, disaggregated adaptive caching, selective replication, and lock-free and log-free indexing to achieve these goals. Compared to a state-of-the-art DPM key-value store, Dinomo achieves at least 3.8x better throughput on various workloads at scale and higher scalability, while providing fast reconfiguration.Comment: This is an extended version of the full paper to appear in PVLDB 15.13 (VLDB 2023

    Resumption of virtual machines after adaptive deduplication of virtual machine images in live migration

    Get PDF
    In cloud computing, load balancing, energy utilization are the critical problems solved by virtual machine (VM) migration. Live migration is the live movement of VMs from an overloaded/underloaded physical machine to a suitable one. During this process, transferring large disk image files take more time, hence more migration and down time. In the proposed adaptive deduplication, based on the image file size, the file undergoes both fixed, variable length deduplication processes. The significance of this paper is resumption of VMs with reunited deduplicated disk image files. The performance measured by calculating the percentage reduction of VM image size after deduplication, the time taken to migrate the deduplicated file and the time taken for each VM to resume after the migration. The results show that 83%, 89.76% reduction overall image size and migration time respectively. For a deduplication ratio of 92%, it takes an overall time of 3.52 minutes, 7% reduction in resumption time, compared with the time taken for the total QCOW2 files with original size. For VMDK files the resumption time reduced by a maximum 17% (7.63 mins) compared with that of for original files

    Overview of Caching Mechanisms to Improve Hadoop Performance

    Full text link
    Nowadays distributed computing environments, large amounts of data are generated from different resources with a high velocity, rendering the data difficult to capture, manage, and process within existing relational databases. Hadoop is a tool to store and process large datasets in a parallel manner across a cluster of machines in a distributed environment. Hadoop brings many benefits like flexibility, scalability, and high fault tolerance; however, it faces some challenges in terms of data access time, I/O operation, and duplicate computations resulting in extra overhead, resource wastage, and poor performance. Many researchers have utilized caching mechanisms to tackle these challenges. For example, they have presented approaches to improve data access time, enhance data locality rate, remove repetitive calculations, reduce the number of I/O operations, decrease the job execution time, and increase resource efficiency. In the current study, we provide a comprehensive overview of caching strategies to improve Hadoop performance. Additionally, a novel classification is introduced based on cache utilization. Using this classification, we analyze the impact on Hadoop performance and discuss the advantages and disadvantages of each group. Finally, a novel hybrid approach called Hybrid Intelligent Cache (HIC) that combines the benefits of two methods from different groups, H-SVM-LRU and CLQLMRS, is presented. Experimental results show that our hybrid method achieves an average improvement of 31.2% in job execution time

    Prototyping a scalable Aggregate Computing cluster with open-source solutions

    Get PDF
    L'Internet of Things è un concetto che è stato ora adottato in modo pervasivo per descrivere un vasto insieme di dispositivi connessi attraverso Internet. Comunemente, i sistemi IoT vengono creati con un approccio bottom-up e si concentrano principalmente sul singolo dispositivo, il quale è visto come la basilare unità programmabile. Da questo metodo può emergere un comportamento comune trovato in molti sistemi esistenti che deriva dall'interazione di singoli dispositivi. Tuttavia, questo crea un'applicazione distribuita spesso dove i componenti sono strettamente legati tra di loro. Quando tali applicazioni crescono in complessità, tendono a soffrire di problemi di progettazione, mancanza di modularità e riusabilità, difficoltà di implementazione e problemi di test e manutenzione. L'Aggregate Programming fornisce un approccio top-down a questi sistemi, in cui l'unità di calcolo di base è un'aggregazione anziché un singolo dispositivo. Questa tesi consiste nella progettazione e nella distribuzione di una piattaforma, basata su tecnologie open-source, per supportare l'Aggregate Computing nel cloud, in cui i dispositivi saranno in grado di scegliere dinamicamente se il calcolo si trova su se stessi o nel cloud. Anche se Aggregate Computing è intrinsecamente progettato per un calcolo distribuito, il Cloud Computing introduce un'alternativa scalabile, affidabile e altamente disponibile come strategia di esecuzione. Quest'opera descrive come sfruttare una Reactive Platform per creare un'applicazione scalabile nel cloud. Dopo che la struttura, l'interazione e il comportamento dell'applicazione sono stati progettati, viene descritto come la distribuzione dei suoi componenti viene effettuata attraverso un approccio di containerizzazione con Kubernetes come orchestratore per gestire lo stato desiderato del sistema con una strategia di Continuous Delivery

    A survey and classification of software-defined storage systems

    Get PDF
    The exponential growth of digital information is imposing increasing scale and efficiency demands on modern storage infrastructures. As infrastructure complexity increases, so does the difficulty in ensuring quality of service, maintainability, and resource fairness, raising unprecedented performance, scalability, and programmability challenges. Software-Defined Storage (SDS) addresses these challenges by cleanly disentangling control and data flows, easing management, and improving control functionality of conventional storage systems. Despite its momentum in the research community, many aspects of the paradigm are still unclear, undefined, and unexplored, leading to misunderstandings that hamper the research and development of novel SDS technologies. In this article, we present an in-depth study of SDS systems, providing a thorough description and categorization of each plane of functionality. Further, we propose a taxonomy and classification of existing SDS solutions according to different criteria. Finally, we provide key insights about the paradigm and discuss potential future research directions for the field.This work was financed by the Portuguese funding agency FCT-Fundacao para a Ciencia e a Tecnologia through national funds, the PhD grant SFRH/BD/146059/2019, the project ThreatAdapt (FCT-FNR/0002/2018), the LASIGE Research Unit (UIDB/00408/2020), and cofunded by the FEDER, where applicable
    corecore