1,029 research outputs found

    Buffer Overflows of Merging Streams

    Full text link

    SALSA: Self-Adjusting Lean Streaming Analytics

    Get PDF
    Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures.This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters, and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open source

    Adaptive Multicast of Multi-Layered Video: Rate-Based and Credit-Based Approaches

    Full text link
    Network architectures that can efficiently transport high quality, multicast video are rapidly becoming a basic requirement of emerging multimedia applications. The main problem complicating multicast video transport is variation in network bandwidth constraints. An attractive solution to this problem is to use an adaptive, multi-layered video encoding mechanism. In this paper, we consider two such mechanisms for the support of video multicast; one is a rate-based mechanism that relies on explicit rate congestion feedback from the network, and the other is a credit-based mechanism that relies on hop-by-hop congestion feedback. The responsiveness, bandwidth utilization, scalability and fairness of the two mechanisms are evaluated through simulations. Results suggest that while the two mechanisms exhibit performance trade-offs, both are capable of providing a high quality video service in the presence of varying bandwidth constraints.Comment: 11 page

    A Survey of Techniques for Improving Security of GPUs

    Full text link
    Graphics processing unit (GPU), although a powerful performance-booster, also has many security vulnerabilities. Due to these, the GPU can act as a safe-haven for stealthy malware and the weakest `link' in the security `chain'. In this paper, we present a survey of techniques for analyzing and improving GPU security. We classify the works on key attributes to highlight their similarities and differences. More than informing users and researchers about GPU security techniques, this survey aims to increase their awareness about GPU security vulnerabilities and potential countermeasures

    Dynamic buffer tuning: an ambience-intelligent way for digital ecosystem success

    Get PDF
    Ambient intelligence is an important element for the success of digital ecosystems which usually are made up of many collaborating distributed nodes. The operations of these nodes affect one another as chain reactions. When one node had failed, it could bring down the whole ecosystem. Dynamic buffer tuning is an ambience-intelligent mechanism because it has the ability to sense the ambient changes and then makes necessary proactive changes on the fly to avoid buffer overflow. As a result the end-to-end communication channel is more dependable, leading to shorter response time and happier clients. Therefore, dynamic buffer tuning should be generally beneficial to digital ecosystem system performance. In this paper we demonstrate this point by using the FLC (Fuzzy Logic Controller) dynamic buffer tuner to quicken the pervasive medical consultation response of the TCM (Traditional Chinese Medicine) Pervasive Digital HealthCare System as an example

    With Great Speed Come Small Buffers: Space-Bandwidth Tradeoffs for Routing

    Full text link
    We consider the Adversarial Queuing Theory (AQT) model, where packet arrivals are subject to a maximum average rate 0ρ10\le\rho\le1 and burstiness σ0\sigma\ge0. In this model, we analyze the size of buffers required to avoid overflows in the basic case of a path. Our main results characterize the space required by the average rate and the number of distinct destinations: we show that O(kd1/k)O(k d^{1/k}) space suffice, where dd is the number of distinct destinations and k=1/ρk=\lfloor 1/\rho \rfloor; and we show that Ω(1kd1/k)\Omega(\frac 1 k d^{1/k}) space is necessary. For directed trees, we describe an algorithm whose buffer space requirement is at most 1+d+σ1 + d' + \sigma where dd' is the maximum number of destinations on any root-leaf path

    Design and Optimization of Residual Neural Network Accelerators for Low-Power FPGAs Using High-Level Synthesis

    Full text link
    Residual neural networks are widely used in computer vision tasks. They enable the construction of deeper and more accurate models by mitigating the vanishing gradient problem. Their main innovation is the residual block which allows the output of one layer to bypass one or more intermediate layers and be added to the output of a later layer. Their complex structure and the buffering required by the residual block make them difficult to implement on resource-constrained platforms. We present a novel design flow for implementing deep learning models for field programmable gate arrays optimized for ResNets, using a strategy to reduce their buffering overhead to obtain a resource-efficient implementation of the residual layer. Our high-level synthesis (HLS)-based flow encompasses a thorough set of design principles and optimization strategies, exploiting in novel ways standard techniques such as temporal reuse and loop merging to efficiently map ResNet models, and potentially other skip connection-based NN architectures, into FPGA. The models are quantized to 8-bit integers for both weights and activations, 16-bit for biases, and 32-bit for accumulations. The experimental results are obtained on the CIFAR-10 dataset using ResNet8 and ResNet20 implemented with Xilinx FPGAs using HLS on the Ultra96-V2 and Kria KV260 boards. Compared to the state-of-the-art on the Kria KV260 board, our ResNet20 implementation achieves 2.88X speedup with 0.5% higher accuracy of 91.3%, while ResNet8 accuracy improves by 2.8% to 88.7%. The throughputs of ResNet8 and ResNet20 are 12971 FPS and 3254 FPS on the Ultra96 board, and 30153 FPS and 7601 FPS on the Kria KV26, respectively. They Pareto-dominate state-of-the-art solutions concerning accuracy, throughput, and energy
    corecore