110 research outputs found

    HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing

    Get PDF
    The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements. The project involves the following components: 1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies

    The global unified parallel file system (GUPFS) project: FY 2002 activities and results

    Full text link

    Replication and Caching Systems for the support of VMs stored in File Systems with Snapshots

    Get PDF
    Recently, in a relatively short timeframe, there were fundamental changes in the way computing power is used. Virtualisation technology has changed both the model of a data centre’s infrastructure and the way physical computers are now managed. This shift is a consequence of today’s fast deployment rate of Virtual Machines (VM) in a high consolidation environment with minimal need for human management. New approaches to virtualisation techniques are being developed at a surprisingly fast rate, leading to a new exciting and vibrating ecosystem of platforms and services. We see the big industry players tackling problems such as Desktop Virtualisation with moderate success, but completely ignoring the computation power already present in their clients’ infrastructures and, instead, opting for a costly solution based on powerful new machines. There’s still room for improvement in Virtual Desktop Infrastructure (VDI) and development of new architectures that take advantage of the computation power available at the user’s desk, with a minimum effort on the management side; Infrastructure for Client-Based Desktops (iCBD) is one of these projects. This thesis focuses on the development of mechanisms for the replication and caching of VM images stored in a local filesystem, albeit one with the ability to perform snapshots. In this work, there are some challenges to address: the proposed architecture must be entirely distributed and completely integrated with the already existing client-based VDI platform; and it must be able to efficiently cope with very large, read-only files, (some of them snapshots) and handle their multiple versions. This work will also explore the challenges and advantages of deploying such a system in a high throughput network, with both high availability and scalability while efficiently supporting a large number of users (and their workstations)

    Study of TCP Issues over Wireless and Implementation of iSCSI over Wireless for Storage Area Networks

    Get PDF
    The Transmission Control Protocol (TCP) has proved to be proficient in classical wired networks, presenting an ability to acclimatize to modern, high-speed networks and present new scenarios for which it was not formerly designed. Wireless access to the Internet requires that information reliability be reserved while data is transmitted over the radio channel. Automatic repeat request (ARQ) schemes and TCP techniques are often used for error-control at the link layer and at the transport layer, respectively. TCP/IP is becoming a communication standard [1]. Initially it was designed to present reliable transmission over IP protocol operating principally in wired networks. Wireless networks are becoming more ubiquitous and we have witnessed an exceptional growth in heterogeneous networks. This report considers the problem of supporting TCP, the Internet data transport protocol, over a lossy wireless link whose features vary over time. Experimental results from a wireless test bed in a research laboratory are reported

    Peak Performance – Remote Memory Revisited

    Get PDF
    Many database systems share a need for large amounts of fast storage. However, economies of scale limit the utility of extending a single machine with an arbitrary amount of memory. The recent broad availability of the zero-copy data transfer protocol RDMA over low-latency and high throughput network connections such as InfiniBand prompts us to revisit the long-proposed usage of memory provided by remote machines. In this paper, we present a solution to make use of remote memory without manipulation of the operating system, and investigate the impact on database performance

    CRAID: Online RAID upgrades using dynamic hot data reorganization

    Get PDF
    Current algorithms used to upgrade RAID arrays typically require large amounts of data to be migrated, even those that move only the minimum amount of data required to keep a balanced data load. This paper presents CRAID, a self-optimizing RAID array that performs an online block reorganization of frequently used, long-term accessed data in order to reduce this migration even further. To achieve this objective, CRAID tracks frequently used, long-term data blocks and copies them to a dedicated partition spread across all the disks in the array. When new disks are added, CRAID only needs to extend this process to the new devices to redistribute this partition, thus greatly reducing the overhead of the upgrade process. In addition, the reorganized access patterns within this partition improve the array’s performance, amortizing the copy overhead and allowing CRAID to offer a performance competitive with traditional RAIDs. We describe CRAID’s motivation and design and we evaluate it by replaying seven real-world workloads including a file server, a web server and a user share. Our experiments show that CRAID can successfully detect hot data variations and begin using new disks as soon as they are added to the array. Also, the usage of a dedicated partition improves the sequentiality of relevant data access, which amortizes the cost of reorganizations. Finally, we prove that a full-HDD CRAID array with a small distributed partition (<1.28% per disk) can compete in performance with an ideally restriped RAID-5 and a hybrid RAID-5 with a small SSD cache.Peer ReviewedPostprint (published version

    Performance analysis of an iSCSI block device in virtualized environment

    Get PDF
    Virtualization is new to telecom but it has been already implemented in IT sectors. Thus its benefits are already proven, which drags other sectors attention towards it. Now the telecom organizations are also focusing on virtualization to reap the full benefits of it. The main focus of this thesis is to conduct a performance analysis of a block storage device in a virtualization environment. Storage performance plays vital role in telecom sector. The performance and the reliability of the storage device is more important factor to fulfill the client request with minimum latency. This thesis is comprised of three main areas. The first literature part is to study the different storage networking possibilities and the different storage protocol practice to establish communication between server and the storage in the storage area network. The study indicated that Internet Small Computer System Interface (iSCSI) has more advantages than other practices in the storage area network. The second part covers the design of storage area network (SAN) solution. The storage is offered by an iSCSI storage server. It offers a block level storage device access to the compute server. Different iSCSI targets are available in market, performance of those were compared. Linux-IO Target was concluded as better iSCSI target with better performance and reliability. The Storage server was implemented as a virtual machine for better resource utilization, thus there was a study about the hypervisor and the different networking options for the virtual machines were compared. The final part is to optimize the SAN solution. Multipathing, different caching options and different driver options provided by the kernel virtual machine (KVM)/ Quick emulators (QEMU) were considered for optimization

    A Novel Storage Virtualization Scheme for Network Storage Systems

    Get PDF
    The network storage systems are generally composed of clients, storage servers and metadata servers. In this paper, we proposed a novel storage virtualization (NSV) scheme which is capable of alleviating the heavy load of metadata server, guaranteeing the storage quality of service and dynamically adapting storage resources. The metadata server automatically constructs a dedicated storage cluster according to various requirements of storage quality of service. The storage cluster may consist of one to many storage servers which includes one master storage server and zero to many slave storage servers. In other words, a network storage system consists of at least one storage cluster. The requests of each client are forwarded to corresponding master storage server within a specific storage cluster. In addition, the master storage server determines the best storage server which handles the requests based on the conditions of storage servers. Next, the requests will be redirected to the selected storage server. Finally, the responses are directly transmitted to the client

    High availability using virtualization

    Get PDF
    High availability has always been one of the main problems for a data center. Till now high availability was achieved by host per host redundancy, a highly expensive method in terms of hardware and human costs. A new approach to the problem can be offered by virtualization. Using virtualization, it is possible to achieve a redundancy system for all the services running on a data center. This new approach to high availability allows to share the running virtual machines over the servers up and running, by exploiting the features of the virtualization layer: start, stop and move virtual machines between physical hosts. The system (3RC) is based on a finite state machine with hysteresis, providing the possibility to restart each virtual machine over any physical host, or reinstall it from scratch. A complete infrastructure has been developed to install operating system and middleware in a few minutes. To virtualize the main servers of a data center, a new procedure has been developed to migrate physical to virtual hosts. The whole Grid data center SNS-PISA is running at the moment in virtual environment under the high availability system. As extension of the 3RC architecture, several storage solutions have been tested to store and centralize all the virtual disks, from NAS to SAN, to grant data safety and access from everywhere. Exploiting virtualization and ability to automatically reinstall a host, we provide a sort of host on-demand, where the action on a virtual machine is performed only when a disaster occurs.Comment: PhD Thesis in Information Technology Engineering: Electronics, Computer Science, Telecommunications, pp. 94, University of Pisa [Italy
    • …
    corecore