159 research outputs found

    Approximation and Compression Techniques to Enhance Performance of Graphics Processing Units

    Get PDF
    A key challenge in modern computing systems is to access data fast enough to fully utilize the computing elements in the chip. In Graphics Processing Units (GPUs), the performance is often constrained by register file size, memory bandwidth, and the capacity of the main memory. One important technique towards alleviating this challenge is data compression. By reducing the amount of data that needs to be communicated or stored, memory resources crucial for performance can be efficiently utilized.This thesis provides a set of approximation and compression techniques for GPUs, with the goal of efficiently utilizing the computational fabric, and thereby increase performance. The thesis shows that these techniques can substantially lower the amount of information the system has to process, and are thus important tools in the process of meeting challenges in memory utilization.This thesis makes contributions within three areas: controlled floating-point precision reduction, lossless and lossy memory compression, and distributed training of neural networks. In the first area, the thesis shows that through automated and controlled floating-point approximation, the register file can be more efficiently utilized. This is achieved through a framework which establishes a cross-layer connection between the application and the microarchitecture layer, and a novel register file organization capable of leveraging low-precision floating-point values and narrow integers for increased capacity and performance.Within the area of compression, this thesis aims at increasing the effective bandwidth of GPUs by presenting a lossless and lossy memory compression algorithm to reduce the amount of transferred data. In contrast to state-of-the-art compression techniques such as Base-Delta-Immediate and Bitplane Compression, which uses intra-block bases for compression, the proposed algorithm leverages multiple global base values to reach a higher compression ratio. The algorithm includes an optional approximation step for floating-point values which offers higher compression ratio at a given, low, error rate.Finally, within the area of distributed training of neural networks, this thesis proposes a subgraph approximation scheme for graph data which mitigates accuracy loss in a distributed setting. The scheme allows neural network models that use graphs as inputs to converge at single-machine accuracy, while minimizing synchronization overhead between the machines

    Bilayer Low-Density Parity-Check Codes for Decode-and-Forward in Relay Channels

    Full text link
    This paper describes an efficient implementation of binning for the relay channel using low-density parity-check (LDPC) codes. We devise bilayer LDPC codes to approach the theoretically promised rate of the decode-and-forward relaying strategy by incorporating relay-generated information bits in specially designed bilayer graphical code structures. While conventional LDPC codes are sensitively tuned to operate efficiently at a certain channel parameter, the proposed bilayer LDPC codes are capable of working at two different channel parameters and two different rates: that at the relay and at the destination. To analyze the performance of bilayer LDPC codes, bilayer density evolution is devised as an extension of the standard density evolution algorithm. Based on bilayer density evolution, a design methodology is developed for the bilayer codes in which the degree distribution is iteratively improved using linear programming. Further, in order to approach the theoretical decode-and-forward rate for a wide range of channel parameters, this paper proposes two different forms bilayer codes, the bilayer-expurgated and bilayer-lengthened codes. It is demonstrated that a properly designed bilayer LDPC code can achieve an asymptotic infinite-length threshold within 0.24 dB gap to the Shannon limits of two different channels simultaneously for a wide range of channel parameters. By practical code construction, finite-length bilayer codes are shown to be able to approach within a 0.6 dB gap to the theoretical decode-and-forward rate of the relay channel at a block length of 10510^5 and a bit-error probability (BER) of 10−410^{-4}. Finally, it is demonstrated that a generalized version of the proposed bilayer code construction is applicable to relay networks with multiple relays.Comment: Submitted to IEEE Trans. Info. Theor

    Collaborative Communication And Storage In Energy-Synchronized Sensor Networks

    Get PDF
    In a battery-less sensor network, all the operation of sensor nodes are strictly constrained by and synchronized with the fluctuations of harvested energy, causing nodes to be disruptive from network and hence unstable network connectivity. Such wireless sensor network is named as energy-synchronized sensor networks. The unpredictable network disruptions and challenging communication environments make the traditional communication protocols inefficient and require a new paradigm-shift in design. In this thesis, I propose a set of algorithms on collaborative data communication and storage for energy-synchronized sensor networks. The solutions are based on erasure codes and probabilistic network codings. The proposed set of algorithms significantly improve the data communication throughput and persistency, and they are inherently amenable to probabilistic nature of transmission in wireless networks. The technical contributions explore collaborative communication with both no coding and network coding methods. First, I propose a collaborative data delivery protocol to exploit the optimal performance of multiple energy-synchronized paths without network coding, i.e. a new max-flow min-variance algorithm. In consort with this data delivery protocol, a localized TDMA MAC protocol is designed to synchronize nodes\u27 duty-cycles and mitigate media access contentions. However, the energy supply can change dynamically over time, making determined duty cycles synchronization difficult in practice. A probabilistic approach is investigated. Therefore, I present Opportunistic Network Erasure Coding protocol (ONEC), to collaboratively collect data. ONEC derives the probability distribution of coding degree in each node and enable opportunistic in-network recoding, and guarantee the recovery of original sensor data can be achieved with high probability upon receiving any sufficient amount of encoded packets. Next, OnCode, an opportunistic in-network data coding and delivery protocol is proposed to further improve data communication under the constraints of energy synchronization. It is resilient to packet loss and network disruptions, and does not require explicit end-to-end feedback message. Moreover, I present a network Erasure Coding with randomized Power Control (ECPC) mechanism for collaborative data storage in disruptive sensor networks. ECPC only requires each node to perform a single broadcast at each of its several randomly selected power levels. Thus it incurs very low communication overhead. Finally, I propose an integrated algorithm and middleware (Ravine Stream) to improve data delivery throughput as well as data persistency in energy-synchronized sensor network

    Topology control and data handling in wireless sensor networks

    Get PDF
    Our work in this thesis have provided two distinctive contributions to WSNs in the areas of data handling and topology control. In the area of data handling, we have demonstrated a solution to improve the power efficiency whilst preserving the important data features by data compression and the use of an adaptive sampling strategy, which are applicable to the specific application for oceanography monitoring required by the SECOAS project. Our work on oceanographic data analysis is important for the understanding of the data we are dealing with, such that suitable strategies can be deployed and system performance can be analysed. The Basic Adaptive Sampling Scheduler (BASS) algorithm uses the statistics of the data to adjust the sampling behaviour in a sensor node according to the environment in order to conserve energy and minimise detection delay. The motivation of topology control (TC) is to maintain the connectivity of the network, to reduce node degree to ease congestion in a collision-based medium access scheme; and to reduce power consumption in the sensor nodes. We have developed an algorithm Subgraph Topology Control (STC) that is distributed and does not require additional equipment to be implemented on the SECOAS nodes. STC uses a metric called subgraph number, which measures the 2-hops connectivity in the neighbourhood of a node. It is found that STC consistently forms topologies that have lower node degrees and higher probabilities of connectivity, as compared to k-Neighbours, an alternative algorithm that does not rely on special hardware on sensor node. Moreover, STC also gives better results in terms of the minimum degree in the network, which implies that the network structure is more robust to a single point of failure. As STC is an iterative algorithm, it is very scalable and adaptive and is well suited for the SECOAS applications

    Applications of graph-based codes in networks: analysis of capacity and design of improved algorithms

    Get PDF
    The conception of turbo codes by Berrou et al. has created a renewed interest in modern graph-based codes. Several encouraging results that have come to light since then have fortified the role these codes shall play as potential solutions for present and future communication problems. This work focuses on both practical and theoretical aspects of graph-based codes. The thesis can be broadly categorized into three parts. The first part of the thesis focuses on the design of practical graph-based codes of short lengths. While both low-density parity-check codes and rateless codes have been shown to be asymptotically optimal under the message-passing (MP) decoder, the performance of short-length codes from these families under MP decoding is starkly sub-optimal. This work first addresses the structural characterization of stopping sets to understand this sub-optimality. Using this characterization, a novel improved decoder that offers several orders of magnitude improvement in bit-error rates is introduced. Next, a novel scheme for the design of a good rate-compatible family of punctured codes is proposed. The second part of the thesis aims at establishing these codes as a good tool to develop reliable, energy-efficient and low-latency data dissemination schemes in networks. The problems of broadcasting in wireless multihop networks and that of unicast in delay-tolerant networks are investigated. In both cases, rateless coding is seen to offer an elegant means of achieving the goals of the chosen communication protocols. It was noticed that the ratelessness and the randomness in encoding process make this scheme specifically suited to such network applications. The final part of the thesis investigates an application of a specific class of codes called network codes to finite-buffer wired networks. This part of the work aims at establishing a framework for the theoretical study and understanding of finite-buffer networks. The proposed Markov chain-based method extends existing results to develop an iterative Markov chain-based technique for general acyclic wired networks. The framework not only estimates the capacity of such networks, but also provides a means to monitor network traffic and packet drop rates on various links of the network.Ph.D.Committee Chair: Fekri, Faramarz; Committee Member: Li, Ye; Committee Member: McLaughlin, Steven; Committee Member: Sivakumar, Raghupathy; Committee Member: Tetali, Prasa

    A Tutorial on Clique Problems in Communications and Signal Processing

    Full text link
    Since its first use by Euler on the problem of the seven bridges of K\"onigsberg, graph theory has shown excellent abilities in solving and unveiling the properties of multiple discrete optimization problems. The study of the structure of some integer programs reveals equivalence with graph theory problems making a large body of the literature readily available for solving and characterizing the complexity of these problems. This tutorial presents a framework for utilizing a particular graph theory problem, known as the clique problem, for solving communications and signal processing problems. In particular, the paper aims to illustrate the structural properties of integer programs that can be formulated as clique problems through multiple examples in communications and signal processing. To that end, the first part of the tutorial provides various optimal and heuristic solutions for the maximum clique, maximum weight clique, and kk-clique problems. The tutorial, further, illustrates the use of the clique formulation through numerous contemporary examples in communications and signal processing, mainly in maximum access for non-orthogonal multiple access networks, throughput maximization using index and instantly decodable network coding, collision-free radio frequency identification networks, and resource allocation in cloud-radio access networks. Finally, the tutorial sheds light on the recent advances of such applications, and provides technical insights on ways of dealing with mixed discrete-continuous optimization problems

    Connected Dominating Set Based Topology Control in Wireless Sensor Networks

    Get PDF
    Wireless Sensor Networks (WSNs) are now widely used for monitoring and controlling of systems where human intervention is not desirable or possible. Connected Dominating Sets (CDSs) based topology control in WSNs is one kind of hierarchical method to ensure sufficient coverage while reducing redundant connections in a relatively crowded network. Moreover, Minimum-sized Connected Dominating Set (MCDS) has become a well-known approach for constructing a Virtual Backbone (VB) to alleviate the broadcasting storm for efficient routing in WSNs extensively. However, no work considers the load-balance factor of CDSsin WSNs. In this dissertation, we first propose a new concept — the Load-Balanced CDS (LBCDS) and a new problem — the Load-Balanced Allocate Dominatee (LBAD) problem. Consequently, we propose a two-phase method to solve LBCDS and LBAD one by one and a one-phase Genetic Algorithm (GA) to solve the problems simultaneously. Secondly, since there is no performance ratio analysis in previously mentioned work, three problems are investigated and analyzed later. To be specific, the MinMax Degree Maximal Independent Set (MDMIS) problem, the Load-Balanced Virtual Backbone (LBVB) problem, and the MinMax Valid-Degree non Backbone node Allocation (MVBA) problem. Approximation algorithms and comprehensive theoretical analysis of the approximation factors are presented in the dissertation. On the other hand, in the current related literature, networks are deterministic where two nodes are assumed either connected or disconnected. In most real applications, however, there are many intermittently connected wireless links called lossy links, which only provide probabilistic connectivity. For WSNs with lossy links, we propose a Stochastic Network Model (SNM). Under this model, we measure the quality of CDSs using CDS reliability. In this dissertation, we construct an MCDS while its reliability is above a preset applicationspecified threshold, called Reliable MCDS (RMCDS). We propose a novel Genetic Algorithm (GA) with immigrant schemes called RMCDS-GA to solve the RMCDS problem. Finally, we apply the constructed LBCDS to a practical application under the realistic SNM model, namely data aggregation. To be specific, a new problem, Load-Balanced Data Aggregation Tree (LBDAT), is introduced finally. Our simulation results show that the proposed algorithms outperform the existing state-of-the-art approaches significantly

    The 1993 Space and Earth Science Data Compression Workshop

    Get PDF
    The Earth Observing System Data and Information System (EOSDIS) is described in terms of its data volume, data rate, and data distribution requirements. Opportunities for data compression in EOSDIS are discussed
    • …
    corecore