1,475 research outputs found
Recommended from our members
Building Distributed Systems with Non-Volatile Main Memories and RDMA Networks
High-performance, byte-addressable non-volatile main memories (NVMMs) allow application developers to combine storage and memory into a single layer. These high-performance storage systems would be especially useful in large-scale data center environments where data is distributed and replicated across multiple servers.Unfortunately, existing approaches of providing remote storage access rest on the assumption that storage is slow, so the cost of the software and protocols is acceptable. Such assumption no longer holds for the fast NVMM. As a result, taking full advantage of NVMMs’ potential will require changes in system software and networking protocol. This thesis focuses on accessing remote NVMM efficiently using remote direct memory access (RDMA) network. RDMA enables a client to directly access memory on a remote machine without involving its local CPU.This thesis first presents Mojim, a system that provides replicated, reliable, and highly-available NVMM as an operating system service. Applications can access data in Mojim using normal load and store instructions while controlling when and how updates propagate to replicas using system calls. Our evaluation shows Mojim adds little overhead to the un-replicated system and provides 0.4x to 2.7x the throughput of the un-replicated system.This thesis then presents Orion, a distributed file system designed from for NVMM and RDMA networks. Traditional distributed file systems are designed for slower hard drives. These slower media incentivizes complex optimizations (e.g., queuing, striping, and batching) around disk accesses. Orion combines file system functions and network operations into a single layer. It provides low latency metadata accesses and outperforms existing distributed file systems by a large margin.Finally, an NVMM application can map files backed by an NVMM file system into its address space, and accesses them using CPU instructions. In this case, RDMA and NVMM file systems introduce duplication of effort on permissions, naming, and address translation. We introduce two changes to the existing RDMA protocol: the file memory region (FileMR) and range based address translation. By eliminating redundant translations, FileMR minimizes the number of translations done at the NIC, reducing the load on the NIC’s translation cache and resulting in application performance improvement by 1.8x - 2.0x
A Privacy-Aware Distributed Storage and Replication Middleware for Heterogeneous Computing Platform
Cloud computing is an emerging research area that has drawn considerable interest in recent years. However, the current infrastructure raises significant concerns about how to protect users\u27 privacy, in part due to that users are storing their data in the cloud vendors\u27 servers. In this paper, we address this challenge by proposing and implementing a novel middleware, called Uno, which separates the storage of physical data and their associated metadata. In our design, users\u27 physical data are stored locally on those devices under a user\u27s full control, while their metadata can be uploaded to the commercial cloud. To ensure the reliability of users\u27 data, we develop a novel fine-grained file replication algorithm that exploits both data access patterns and device state patterns. Based on a quantitative analysis of the data set from Rice University, this algorithm replicates data intelligently in different time slots, so that it can not only significantly improve data availability, but also achieve a satisfactory performance on load balancing and storage diversification. We implement the Uno system on a heterogeneous testbed composed of both host servers and mobile devices, and demonstrate the programmability of Uno through implementation and evaluation of two sample applications, Uno@Home and Uno@Sense
Peer-To-Peer Backup for Personal Area Networks
FlashBack is a peer-to-peer backup algorithm designed for power-constrained devices running in a personal area network (PAN). Backups are performed transparently as local updates initiate the spread of backup data among a subset of the currently available peers. Flashback limits power usage by avoiding flooding and keeping small neighbor sets. Flashback has also been designed to utilize powered infrastructure when possible to further extend device lifetime. We propose our architecture and algorithms, and present initial experimental results that illustrate FlashBack’s performance characteristic
Design and Implementation of the L-Bone and Logistical Tools
The purpose of this paper is to outline the design criteria and implementation of the Logistical Backbone (L-Bone) and the Logistical Tools. These tools, along with IBP and the exNode Library, allow storage to be used as a network resource. These are components of the Network Storage Stack, a design by the Logistical Computing and Internetworking Lab at the University of Tennessee. Having storage as a network resource enables users to do many things that are either difficult or not possible today, such as moving and sharing very large files across administrative domains, improving performance through caching and improving fault-tolerance through replication and striping.
Next, this paper reviews the L-Bone, a directory service for Internet Backplane Protocol (IBP) storage servers (depots) which stores information about the depots and allows clients to query the service for depots matching specific requirements. The L-Bone has three major components: a client API, a stateless RPC server and a database backend. Because the L-Bone is intended to be a service available to anyone on the wide-area network, response time is critical. The current implementation provides a reliable service and a fast service. Average response times from remote clients are less than half a second.
Lastly, this paper examines the Logistical Tools. The Logistical Tools are a set of command line tools wrapped around a C API. They provide a higher level of functionality built on top of the exNode Library as well as the L-Bone library, IBP library and the Network Weather Service (NWS) library. This set of tools allows a user to upload a file into an exNode, download the data from that exNode, add more replicas or remove replicas from the exNode, check the status of the exNode and modify the expiration times of the IBP allocations. To highlight the capabilities of these tools and the overall benefits of using exNodes, I perform tests that look at the performance improvements through local replication (caching) as well as tests that look at the higher levels of fault-tolerance through replication. These tests show that using replication for caching can improve access time from 2 to 16 times and that using simple replication can provide nearly 100% availability
Sift: Achieving Resource-Efficient Consensus with RDMA
Sift is a new consensus protocol for replicating state machines. It disaggregates CPU and memory consumption by creating a novel system architecture enabled by one-sided RDMA operations. We show that this system architecture allows us to develop a consensus protocol which centralizes the replication logic. The result is a simplified protocol design with less complex interactions between the participants of the consensus group compared to traditional protocols. The dissaggregated design also enables Sift to reduce deployment costs by sharing backup computational nodes across consensus groups deployed within the same cloud environment. The required storage resources can be further reduced by integrating erasure codes without making significant changes to our protocol. Evaluation results show that in a cloud environment with 100 groups where each group can support up to 2 simultaneous failures, Sift can reduce the cost by 56% compared to an RDMA-based Raft deployment
- …