10 research outputs found
Storage Codes with Flexible Number of Nodes
This paper presents flexible storage codes, a class of error-correcting codes
that can recover information from a flexible number of storage nodes. As a
result, one can make a better use of the available storage nodes in the
presence of unpredictable node failures and reduce the data access latency. Let
us assume a storage system encodes information symbols over a finite
field into nodes, each of size symbols. The code is
parameterized by a set of tuples ,
satisfying and , such that the information symbols can be reconstructed from any
nodes, each node accessing symbols. In other words, the code
allows a flexible number of nodes for decoding to accommodate the variance in
the data access time of the nodes. Code constructions are presented for
different storage scenarios, including LRC (locally recoverable) codes, PMDS
(partial MDS) codes, and MSR (minimum storage regenerating) codes. We analyze
the latency of accessing information and perform simulations on Amazon clusters
to show the efficiency of presented codes
Bandwidth-efficient Video Streaming with Network Coding on Peer-to-Peer Networks
PhDOver the last decade, live video streaming applications have gained great popularity among users but put great pressure on video servers and the Internet. In order to satisfy the growing demands for live video streaming, Peer-to-Peer(P2P) has been developed to relieve the video servers of bandwidth bottlenecks and computational load. Furthermore, Network Coding (NC) has been proposed and proved as a significant breakthrough in information theory and coding theory. According to previous research, NC not only brings substantial improvements regarding throughput and delay in data transmission, but also provides innovative solutions for multiple issues related to resource allocation, such as the coupon-collection problem, allocation and scheduling procedure. However, the complex NC-driven P2P streaming network poses substantial challenges to the packet scheduling algorithm.
This thesis focuses on the packet scheduling algorithm for video multicast in NC-driven P2P streaming network. It determines how upload bandwidth resources of peer nodes are allocated in different transmission scenarios to achieve a better Quality of Service(QoS).
First, an optimized rate allocation algorithm is proposed for scalable video transmission (SVT) in the NC-based lossy streaming network. This algorithm is developed to achieve the tradeoffs between average video distortion and average bandwidth redundancy in each generation. It determines how senders allocate their upload bandwidth to different classes in scalable data so that the sum of the distortion and the weighted redundancy ratio can be minimized.
Second, in the NC-based non-scalable video transmission system, the bandwidth ineffi- ciency which is caused by the asynchronization communication among peers is reduced. First, a scalable compensation model and an adaptive push algorithm are proposed to reduce the unrecoverable transmission caused by network loss and insufficient bandwidth resources. Then a centralized packet scheduling algorithm is proposed to reduce the unin- formative transmission caused by the asynchronized communication among sender nodes. Subsequently, we further propose a distributed packet scheduling algorithm, which adds a critical scalability property to the packet scheduling model.
Third, the bandwidth resource scheduling for SVT is further studied. A novel multiple- generation scheduling algorithm is proposed to determine the quality classes that the receiver node can subscribe to so that the overall perceived video quality can be maxi- mized. A single generation scheduling algorithm for SVT is also proposed to provide a faster and easier solution to the video quality maximization function.
Thorough theoretical analysis is conducted in the development of all proposed algorithms, and their performance is evaluated via comprehensive simulations. We have demon- strated, by adjusting the conventional transmission model and involving new packet scheduling models, the overall QoS and bandwidth efficiency are dramatically improved. In non-scalable video streaming system, the maximum video quality gain can be around 5dB compared with the random push method, and the overall uninformative transmiss- sion ratio are reduced to 1% - 2%. In scalable video streaming system, the maximum video quality gain can be around 7dB, and the overall uninformative transmission ratio are reduced to 2% - 3%
Exploring the design space of cooperative streaming multicast
Video streaming over the Internet is rapidly rising in popularity, but the availability and quality of video content is currently limited by the high bandwidth costs and infrastructure needs of server-based solutions. Recently, however, cooperative end-system multicast (CEM) has emerged as a promising paradigm for content distribution in the Internet, because the bandwidth overhead of disseminating content is shared among the participants of the CEM overlay network. In this thesis, we identify the dimensions in the design space of CEMs, explore the design space, and seek to understand the inherent tradeoffs of different design choices.
In the first part of the thesis, we study the control mechanisms for CEM overlay maintenance. We demonstrate that the control task of neighbor acquisition in CEMs can be factored out into a separate control overlay that provides a single primitive: a configurable anycast for peer selection. The separation of control from data overlay avoids the efficiency tradeoffs that afflict some of the current systems. The anycast primitive can be used to build and maintain different data overlay organizations like single-tree, multi-tree, mesh-based, and hybrids, by expressing appropriate policies. We built SAAR, a reusable, shared control overlay for CEMs, that efficiently implements this anycast primitive, and thereby, efficiently serves the control needs for CEMs.
In the second part of the thesis, we focus on techniques for data dissemination. We built a common framework in which different CEM data delivery techniques can be faithfully compared. A systematic empirical comparison of CEM design choices demonstrates that there is no single approach that is best in all scenarios. In fact, our results suggest that every CEM protocol is inherently limited in certain aspects of its performance. We distill our observations into a novel model that explains the inherent tradeoffs of CEM design choices and provides bounds on the practical performance limits of any future CEM protocol. In particular, the model asserts that no CEM design can simultaneously achieve all three of low overhead, low lag, and high streaming quality
Coding for Security and Reliability in Distributed Systems
This dissertation studies the use of coding techniques to improve the reliability and security of distributed systems. The first three parts focus on distributed storage systems, and study schemes that encode a message into n shares, assigned to n nodes, such that any n - r nodes can decode the message (reliability) and any colluding z nodes cannot infer any information about the message (security). The objective is to optimize the computational, implementation, communication and access complexity of the schemes during the process of encoding, decoding and repair. These are the key metrics of the schemes so that when they are applied in practical distributed storage systems, the systems are not only reliable and secure, but also fast and cost-effective.
Schemes with highly efficient computation and implementation are studied in Part I. For the practical high rate case of r β€ 3 and z β€ 3, we construct schemes that require only r + z XORs to encode and z XORs to decode each message bit, based on practical erasure codes including the B, EVENODD and STAR codes. This encoding and decoding complexity is shown to be optimal. For general r and z, we design schemes over a special ring from Cauchy matrices and Vandermonde matrices. Both schemes can be efficiently encoded and decoded due to the structure of the ring. We also discuss methods to shorten the proposed schemes.
Part II studies schemes that are efficient in terms of communication and access complexity. We derive a lower bound on the decoding bandwidth, and design schemes achieving the optimal decoding bandwidth and access. We then design schemes that achieve the optimal bandwidth and access not only for decoding, but also for repair. Furthermore, we present a family of Shamir's schemes with asymptotically optimal decoding bandwidth.
Part III studies the problem of secure repair, i.e., reconstructing the share of a (failed) node without leaking any information about the message. We present generic secure repair protocols that can securely repair any linear schemes. We derive a lower bound on the secure repair bandwidth and show that the proposed protocols are essentially optimal in terms of bandwidth.
In the final part of the dissertation, we study the use of coding techniques to improve the reliability and security of network communication.
Specifically, in Part IV we draw connections between several important problems in network coding. We present reductions that map an arbitrary multiple-unicast network coding instance to a unicast secure network coding instance in which at most one link is eavesdropped, or a unicast network error correction instance in which at most one link is erroneous, such that a rate tuple is achievable in the multiple-unicast network coding instance if and only if a corresponding rate is achievable in the unicast secure network coding instance, or in the unicast network error correction instance. Conversely, we show that an arbitrary unicast secure network coding instance in which at most one link is eavesdropped can be reduced back to a multiple-unicast network coding instance. Additionally, we show that the capacity of a unicast network error correction instance in general is not (exactly) achievable. We derive upper bounds on the secrecy capacity for the secure network coding problem, based on cut-sets and the connectivity of links. Finally, we study optimal coding schemes for the network error correction problem, in the setting that the network and adversary parameters are not known a priori.</p
Recommended from our members
Harnessing Simulated Data with Graphs
Physically accurate simulations allow for unlimited exploration of arbitrarily crafted environments. From a scientific perspective, digital representations of the real world are useful because they make it easy validate ideas. Virtual sandboxes allow observations to be collected at-will, without intricate setting up for measurements or needing to wait on the manufacturing, shipping, and assembly of physical resources. Simulation techniques can also be utilized over and over again to test the problem without expending costly materials or producing any waste.
Remarkably, this freedom to both experiment and generate data becomes even more powerful when considering the rising adoption of data-driven techniques across engineering disciplines. These are systems that aggregate over available samples to model behavior, and thus are better informed when exposed to more data. Naturally, the ability to synthesize limitless data promises to make approaches that benefit from datasets all the more robust and desirable.
However, the ability to readily and endlessly produce synthetic examples also introduces several new challenges. Data must be collected in an adaptive format that can capture the complete diversity of states achievable in arbitrary simulated configurations while too remaining amenable to downstream applications. The quantity and zoology of observations must also straddle a range which prevents overfitting but is descriptive enough to produce a robust approach. Pipelines that naively measure virtual scenarios can easily be overwhelmed by trying to sample an infinite set of available configurations. Variations observed across multiple dimensions can quickly lead to a daunting expansion of states, all of which must be processed and solved. These and several other concerns must first be addressed in order to safely leverage the potential of boundless simulated data.
In response to these challenges, this thesis proposes to wield graphs in order to instill structure over digitally captured data, and curb the growth of variables. The paradigm of pairing data with graphs introduced in this dissertation serves to enforce consistency, localize operators, and crucially factor out any combinatorial explosion of states. Results demonstrate the effectiveness of this methodology in three distinct areas, each individually offering unique challenges and practical constraints, and together showcasing the generality of the approach. Namely, studies observing state-of-the-art contributions in design for additive manufacturing, side-channel security threats, and large-scale physics based contact simulations are collectively achieved by harnessing simulated datasets with graph algorithms
LIPIcs, Volume 244, ESA 2022, Complete Volume
LIPIcs, Volume 244, ESA 2022, Complete Volum