1,780 research outputs found
MLPerf Inference Benchmark
Machine-learning (ML) hardware and software system demand is burgeoning.
Driven by ML applications, the number of different ML inference systems has
exploded. Over 100 organizations are building ML inference chips, and the
systems that incorporate existing models span at least three orders of
magnitude in power consumption and five orders of magnitude in performance;
they range from embedded devices to data-center solutions. Fueling the hardware
are a dozen or more software frameworks and libraries. The myriad combinations
of ML hardware and ML software make assessing ML-system performance in an
architecture-neutral, representative, and reproducible manner challenging.
There is a clear need for industry-wide standard ML benchmarking and evaluation
criteria. MLPerf Inference answers that call. In this paper, we present our
benchmarking method for evaluating ML inference systems. Driven by more than 30
organizations as well as more than 200 ML engineers and practitioners, MLPerf
prescribes a set of rules and best practices to ensure comparability across
systems with wildly differing architectures. The first call for submissions
garnered more than 600 reproducible inference-performance measurements from 14
organizations, representing over 30 systems that showcase a wide range of
capabilities. The submissions attest to the benchmark's flexibility and
adaptability.Comment: ISCA 202
Elevating commodity storage with the SALSA host translation layer
To satisfy increasing storage demands in both capacity and performance,
industry has turned to multiple storage technologies, including Flash SSDs and
SMR disks. These devices employ a translation layer that conceals the
idiosyncrasies of their mediums and enables random access. Device translation
layers are, however, inherently constrained: resources on the drive are scarce,
they cannot be adapted to application requirements, and lack visibility across
multiple devices. As a result, performance and durability of many storage
devices is severely degraded.
In this paper, we present SALSA: a translation layer that executes on the
host and allows unmodified applications to better utilize commodity storage.
SALSA supports a wide range of single- and multi-device optimizations and,
because is implemented in software, can adapt to specific workloads. We
describe SALSA's design, and demonstrate its significant benefits using
microbenchmarks and case studies based on three applications: MySQL, the Swift
object store, and a video server.Comment: Presented at 2018 IEEE 26th International Symposium on Modeling,
Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS
Satellite Imagery Multiscale Rapid Detection with Windowed Networks
Detecting small objects over large areas remains a significant challenge in
satellite imagery analytics. Among the challenges is the sheer number of pixels
and geographical extent per image: a single DigitalGlobe satellite image
encompasses over 64 km2 and over 250 million pixels. Another challenge is that
objects of interest are often minuscule (~pixels in extent even for the highest
resolution imagery), which complicates traditional computer vision techniques.
To address these issues, we propose a pipeline (SIMRDWN) that evaluates
satellite images of arbitrarily large size at native resolution at a rate of >
0.2 km2/s. Building upon the tensorflow object detection API paper, this
pipeline offers a unified approach to multiple object detection frameworks that
can run inference on images of arbitrary size. The SIMRDWN pipeline includes a
modified version of YOLO (known as YOLT), along with the models of the
tensorflow object detection API: SSD, Faster R-CNN, and R-FCN. The proposed
approach allows comparison of the performance of these four frameworks, and can
rapidly detect objects of vastly different scales with relatively little
training data over multiple sensors. For objects of very different scales (e.g.
airplanes versus airports) we find that using two different detectors at
different scales is very effective with negligible runtime cost.We evaluate
large test images at native resolution and find mAP scores of 0.2 to 0.8 for
vehicle localization, with the YOLT architecture achieving both the highest mAP
and fastest inference speed.Comment: 8 pages, 7 figures, 2 tables, 1 appendix. arXiv admin note:
substantial text overlap with arXiv:1805.0951
SimpleSSD: Modeling Solid State Drives for Holistic System Simulation
Existing solid state drive (SSD) simulators unfortunately lack hardware
and/or software architecture models. Consequently, they are far from capturing
the critical features of contemporary SSD devices. More importantly, while the
performance of modern systems that adopt SSDs can vary based on their numerous
internal design parameters and storage-level configurations, a full system
simulation with traditional SSD models often requires unreasonably long
runtimes and excessive computational resources. In this work, we propose
SimpleSSD, a highfidelity simulator that models all detailed characteristics of
hardware and software, while simplifying the nondescript features of storage
internals. In contrast to existing SSD simulators, SimpleSSD can easily be
integrated into publicly-available full system simulators. In addition, it can
accommodate a complete storage stack and evaluate the performance of SSDs along
with diverse memory technologies and microarchitectures. Thus, it facilitates
simulations that explore the full design space at different levels of system
abstraction.Comment: This paper has been accepted at IEEE Computer Architecture Letters
(CAL
HVSTO: Efficient Privacy Preserving Hybrid Storage in Cloud Data Center
In cloud data center, shared storage with good management is a main structure
used for the storage of virtual machines (VM). In this paper, we proposed
Hybrid VM storage (HVSTO), a privacy preserving shared storage system designed
for the virtual machine storage in large-scale cloud data center. Unlike
traditional shared storage, HVSTO adopts a distributed structure to preserve
privacy of virtual machines, which are a threat in traditional centralized
structure. To improve the performance of I/O latency in this distributed
structure, we use a hybrid system to combine solid state disk and distributed
storage. From the evaluation of our demonstration system, HVSTO provides a
scalable and sufficient throughput for the platform as a service
infrastructure.Comment: 7 pages, 8 figures, in proceeding of The Second International
Workshop on Security and Privacy in Big Data (BigSecurity 2014
- …