3,848 research outputs found
Modeling and Evaluation of Multisource Streaming Strategies in P2P VoD Systems
In recent years, multimedia content distribution has largely been moved to the Internet, inducing broadcasters, operators and service providers to upgrade with large expenses their infrastructures. In this context, streaming solutions that rely on user devices such as set-top boxes (STBs) to offload dedicated streaming servers are particularly appropriate. In these systems, contents are usually replicated and scattered over the network established by STBs placed at users' home, and the video-on-demand (VoD) service is provisioned through streaming sessions established among neighboring STBs following a Peer-to-Peer fashion. Up to now the majority of research works have focused on the design and optimization of content replicas mechanisms to minimize server costs. The optimization of replicas mechanisms has been typically performed either considering very crude system performance indicators or analyzing asymptotic behavior. In this work, instead, we propose an analytical model that complements previous works providing fairly accurate predictions of system performance (i.e., blocking probability). Our model turns out to be a highly scalable, flexible, and extensible tool that may be helpful both for designers and developers to efficiently predict the effect of system design choices in large scale STB-VoD system
Catalog Dynamics: Impact of Content Publishing and Perishing on the Performance of a LRU Cache
The Internet heavily relies on Content Distribution Networks and transparent
caches to cope with the ever-increasing traffic demand of users. Content,
however, is essentially versatile: once published at a given time, its
popularity vanishes over time. All requests for a given document are then
concentrated between the publishing time and an effective perishing time.
In this paper, we propose a new model for the arrival of content requests,
which takes into account the dynamical nature of the content catalog. Based on
two large traffic traces collected on the Orange network, we use the
semi-experimental method and determine invariants of the content request
process. This allows us to define a simple mathematical model for content
requests; by extending the so-called "Che approximation", we then compute the
performance of a LRU cache fed with such a request process, expressed by its
hit ratio. We numerically validate the good accuracy of our model by comparison
to trace-based simulation.Comment: 13 Pages, 9 figures. Full version of the article submitted to the ITC
2014 conference. Small corrections in the appendix from the previous versio
Computing server power modeling in a data center: survey,taxonomy and performance evaluation
Data centers are large scale, energy-hungry infrastructure serving the
increasing computational demands as the world is becoming more connected in
smart cities. The emergence of advanced technologies such as cloud-based
services, internet of things (IoT) and big data analytics has augmented the
growth of global data centers, leading to high energy consumption. This upsurge
in energy consumption of the data centers not only incurs the issue of surging
high cost (operational and maintenance) but also has an adverse effect on the
environment. Dynamic power management in a data center environment requires the
cognizance of the correlation between the system and hardware level performance
counters and the power consumption. Power consumption modeling exhibits this
correlation and is crucial in designing energy-efficient optimization
strategies based on resource utilization. Several works in power modeling are
proposed and used in the literature. However, these power models have been
evaluated using different benchmarking applications, power measurement
techniques and error calculation formula on different machines. In this work,
we present a taxonomy and evaluation of 24 software-based power models using a
unified environment, benchmarking applications, power measurement technique and
error formula, with the aim of achieving an objective comparison. We use
different servers architectures to assess the impact of heterogeneity on the
models' comparison. The performance analysis of these models is elaborated in
the paper
Unravelling the Impact of Temporal and Geographical Locality in Content Caching Systems
To assess the performance of caching systems, the definition of a proper
process describing the content requests generated by users is required.
Starting from the analysis of traces of YouTube video requests collected inside
operational networks, we identify the characteristics of real traffic that need
to be represented and those that instead can be safely neglected. Based on our
observations, we introduce a simple, parsimonious traffic model, named Shot
Noise Model (SNM), that allows us to capture temporal and geographical locality
of content popularity. The SNM is sufficiently simple to be effectively
employed in both analytical and scalable simulative studies of caching systems.
We demonstrate this by analytically characterizing the performance of the LRU
caching policy under the SNM, for both a single cache and a network of caches.
With respect to the standard Independent Reference Model (IRM), some
paradigmatic shifts, concerning the impact of various traffic characteristics
on cache performance, clearly emerge from our results.Comment: 14 pages, 11 Figures, 2 Appendice
A unified approach to the performance analysis of caching systems
We propose a unified methodology to analyse the performance of caches (both
isolated and interconnected), by extending and generalizing a decoupling
technique originally known as Che's approximation, which provides very accurate
results at low computational cost. We consider several caching policies, taking
into account the effects of temporal locality. In the case of interconnected
caches, our approach allows us to do better than the Poisson approximation
commonly adopted in prior work. Our results, validated against simulations and
trace-driven experiments, provide interesting insights into the performance of
caching systems.Comment: in ACM TOMPECS 20016. Preliminary version published at IEEE Infocom
201
When parallel speedups hit the memory wall
After Amdahl's trailblazing work, many other authors proposed analytical
speedup models but none have considered the limiting effect of the memory wall.
These models exploited aspects such as problem-size variation, memory size,
communication overhead, and synchronization overhead, but data-access delays
are assumed to be constant. Nevertheless, such delays can vary, for example,
according to the number of cores used and the ratio between processor and
memory frequencies. Given the large number of possible configurations of
operating frequency and number of cores that current architectures can offer,
suitable speedup models to describe such variations among these configurations
are quite desirable for off-line or on-line scheduling decisions. This work
proposes new parallel speedup models that account for variations of the average
data-access delay to describe the limiting effect of the memory wall on
parallel speedups. Analytical results indicate that the proposed modeling can
capture the desired behavior while experimental hardware results validate the
former. Additionally, we show that when accounting for parameters that reflect
the intrinsic characteristics of the applications, such as degree of
parallelism and susceptibility to the memory wall, our proposal has significant
advantages over machine-learning-based modeling. Moreover, besides being
black-box modeling, our experiments show that conventional machine-learning
modeling needs about one order of magnitude more measurements to reach the same
level of accuracy achieved in our modeling.Comment: 24 page
MLPerf Inference Benchmark
Machine-learning (ML) hardware and software system demand is burgeoning.
Driven by ML applications, the number of different ML inference systems has
exploded. Over 100 organizations are building ML inference chips, and the
systems that incorporate existing models span at least three orders of
magnitude in power consumption and five orders of magnitude in performance;
they range from embedded devices to data-center solutions. Fueling the hardware
are a dozen or more software frameworks and libraries. The myriad combinations
of ML hardware and ML software make assessing ML-system performance in an
architecture-neutral, representative, and reproducible manner challenging.
There is a clear need for industry-wide standard ML benchmarking and evaluation
criteria. MLPerf Inference answers that call. In this paper, we present our
benchmarking method for evaluating ML inference systems. Driven by more than 30
organizations as well as more than 200 ML engineers and practitioners, MLPerf
prescribes a set of rules and best practices to ensure comparability across
systems with wildly differing architectures. The first call for submissions
garnered more than 600 reproducible inference-performance measurements from 14
organizations, representing over 30 systems that showcase a wide range of
capabilities. The submissions attest to the benchmark's flexibility and
adaptability.Comment: ISCA 202
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
We empirically evaluate an undervolting technique, i.e., underscaling the
circuit supply voltage below the nominal level, to improve the power-efficiency
of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable
Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing
faults due to excessive circuit latency increase. We evaluate the
reliability-power trade-off for such accelerators. Specifically, we
experimentally study the reduced-voltage operation of multiple components of
real FPGAs, characterize the corresponding reliability behavior of CNN
accelerators, propose techniques to minimize the drawbacks of reduced-voltage
operation, and combine undervolting with architectural CNN optimization
techniques, i.e., quantization and pruning. We investigate the effect of
environmental temperature on the reliability-power trade-off of such
accelerators. We perform experiments on three identical samples of modern
Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification
CNN benchmarks. This approach allows us to study the effects of our
undervolting technique for both software and hardware variability. We achieve
more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain
is the result of eliminating the voltage guardband region, i.e., the safe
voltage region below the nominal level that is set by FPGA vendor to ensure
correct functionality in worst-case environmental and circuit conditions. 43%
of the power-efficiency gain is due to further undervolting below the
guardband, which comes at the cost of accuracy loss in the CNN accelerator. We
evaluate an effective frequency underscaling technique that prevents this
accuracy loss, and find that it reduces the power-efficiency gain from 43% to
25%.Comment: To appear at the DSN 2020 conferenc
- …