1,029 research outputs found
SALSA: Self-Adjusting Lean Streaming Analytics
Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures.This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters, and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open source
Adaptive Multicast of Multi-Layered Video: Rate-Based and Credit-Based Approaches
Network architectures that can efficiently transport high quality, multicast
video are rapidly becoming a basic requirement of emerging multimedia
applications. The main problem complicating multicast video transport is
variation in network bandwidth constraints. An attractive solution to this
problem is to use an adaptive, multi-layered video encoding mechanism. In this
paper, we consider two such mechanisms for the support of video multicast; one
is a rate-based mechanism that relies on explicit rate congestion feedback from
the network, and the other is a credit-based mechanism that relies on
hop-by-hop congestion feedback. The responsiveness, bandwidth utilization,
scalability and fairness of the two mechanisms are evaluated through
simulations. Results suggest that while the two mechanisms exhibit performance
trade-offs, both are capable of providing a high quality video service in the
presence of varying bandwidth constraints.Comment: 11 page
A Survey of Techniques for Improving Security of GPUs
Graphics processing unit (GPU), although a powerful performance-booster, also
has many security vulnerabilities. Due to these, the GPU can act as a
safe-haven for stealthy malware and the weakest `link' in the security `chain'.
In this paper, we present a survey of techniques for analyzing and improving
GPU security. We classify the works on key attributes to highlight their
similarities and differences. More than informing users and researchers about
GPU security techniques, this survey aims to increase their awareness about GPU
security vulnerabilities and potential countermeasures
Dynamic buffer tuning: an ambience-intelligent way for digital ecosystem success
Ambient intelligence is an important element for the success of digital ecosystems which usually are made up of many collaborating distributed nodes. The operations of these nodes affect one another as chain reactions. When one node had failed, it could bring down the whole ecosystem. Dynamic buffer tuning is an ambience-intelligent mechanism because it has the ability to sense the ambient changes and then makes necessary proactive changes on the fly to avoid buffer overflow. As a result the end-to-end communication channel is more dependable, leading to shorter response time and happier clients. Therefore, dynamic buffer tuning should be generally beneficial to digital ecosystem system performance. In this paper we demonstrate this point by using the FLC (Fuzzy Logic Controller) dynamic buffer tuner to quicken the pervasive medical consultation response of the TCM (Traditional Chinese Medicine) Pervasive Digital HealthCare System as an example
With Great Speed Come Small Buffers: Space-Bandwidth Tradeoffs for Routing
We consider the Adversarial Queuing Theory (AQT) model, where packet arrivals
are subject to a maximum average rate and burstiness
. In this model, we analyze the size of buffers required to avoid
overflows in the basic case of a path. Our main results characterize the space
required by the average rate and the number of distinct destinations: we show
that space suffice, where is the number of distinct
destinations and ; and we show that space is necessary. For directed trees, we describe an algorithm
whose buffer space requirement is at most where is the
maximum number of destinations on any root-leaf path
Design and Optimization of Residual Neural Network Accelerators for Low-Power FPGAs Using High-Level Synthesis
Residual neural networks are widely used in computer vision tasks. They
enable the construction of deeper and more accurate models by mitigating the
vanishing gradient problem. Their main innovation is the residual block which
allows the output of one layer to bypass one or more intermediate layers and be
added to the output of a later layer. Their complex structure and the buffering
required by the residual block make them difficult to implement on
resource-constrained platforms. We present a novel design flow for implementing
deep learning models for field programmable gate arrays optimized for ResNets,
using a strategy to reduce their buffering overhead to obtain a
resource-efficient implementation of the residual layer. Our high-level
synthesis (HLS)-based flow encompasses a thorough set of design principles and
optimization strategies, exploiting in novel ways standard techniques such as
temporal reuse and loop merging to efficiently map ResNet models, and
potentially other skip connection-based NN architectures, into FPGA. The models
are quantized to 8-bit integers for both weights and activations, 16-bit for
biases, and 32-bit for accumulations. The experimental results are obtained on
the CIFAR-10 dataset using ResNet8 and ResNet20 implemented with Xilinx FPGAs
using HLS on the Ultra96-V2 and Kria KV260 boards. Compared to the
state-of-the-art on the Kria KV260 board, our ResNet20 implementation achieves
2.88X speedup with 0.5% higher accuracy of 91.3%, while ResNet8 accuracy
improves by 2.8% to 88.7%. The throughputs of ResNet8 and ResNet20 are 12971
FPS and 3254 FPS on the Ultra96 board, and 30153 FPS and 7601 FPS on the Kria
KV26, respectively. They Pareto-dominate state-of-the-art solutions concerning
accuracy, throughput, and energy
- …