1,780 research outputs found

    MLPerf Inference Benchmark

    Full text link
    Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.Comment: ISCA 202

    Elevating commodity storage with the SALSA host translation layer

    Full text link
    To satisfy increasing storage demands in both capacity and performance, industry has turned to multiple storage technologies, including Flash SSDs and SMR disks. These devices employ a translation layer that conceals the idiosyncrasies of their mediums and enables random access. Device translation layers are, however, inherently constrained: resources on the drive are scarce, they cannot be adapted to application requirements, and lack visibility across multiple devices. As a result, performance and durability of many storage devices is severely degraded. In this paper, we present SALSA: a translation layer that executes on the host and allows unmodified applications to better utilize commodity storage. SALSA supports a wide range of single- and multi-device optimizations and, because is implemented in software, can adapt to specific workloads. We describe SALSA's design, and demonstrate its significant benefits using microbenchmarks and case studies based on three applications: MySQL, the Swift object store, and a video server.Comment: Presented at 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS

    Satellite Imagery Multiscale Rapid Detection with Windowed Networks

    Full text link
    Detecting small objects over large areas remains a significant challenge in satellite imagery analytics. Among the challenges is the sheer number of pixels and geographical extent per image: a single DigitalGlobe satellite image encompasses over 64 km2 and over 250 million pixels. Another challenge is that objects of interest are often minuscule (~pixels in extent even for the highest resolution imagery), which complicates traditional computer vision techniques. To address these issues, we propose a pipeline (SIMRDWN) that evaluates satellite images of arbitrarily large size at native resolution at a rate of > 0.2 km2/s. Building upon the tensorflow object detection API paper, this pipeline offers a unified approach to multiple object detection frameworks that can run inference on images of arbitrary size. The SIMRDWN pipeline includes a modified version of YOLO (known as YOLT), along with the models of the tensorflow object detection API: SSD, Faster R-CNN, and R-FCN. The proposed approach allows comparison of the performance of these four frameworks, and can rapidly detect objects of vastly different scales with relatively little training data over multiple sensors. For objects of very different scales (e.g. airplanes versus airports) we find that using two different detectors at different scales is very effective with negligible runtime cost.We evaluate large test images at native resolution and find mAP scores of 0.2 to 0.8 for vehicle localization, with the YOLT architecture achieving both the highest mAP and fastest inference speed.Comment: 8 pages, 7 figures, 2 tables, 1 appendix. arXiv admin note: substantial text overlap with arXiv:1805.0951

    SimpleSSD: Modeling Solid State Drives for Holistic System Simulation

    Full text link
    Existing solid state drive (SSD) simulators unfortunately lack hardware and/or software architecture models. Consequently, they are far from capturing the critical features of contemporary SSD devices. More importantly, while the performance of modern systems that adopt SSDs can vary based on their numerous internal design parameters and storage-level configurations, a full system simulation with traditional SSD models often requires unreasonably long runtimes and excessive computational resources. In this work, we propose SimpleSSD, a highfidelity simulator that models all detailed characteristics of hardware and software, while simplifying the nondescript features of storage internals. In contrast to existing SSD simulators, SimpleSSD can easily be integrated into publicly-available full system simulators. In addition, it can accommodate a complete storage stack and evaluate the performance of SSDs along with diverse memory technologies and microarchitectures. Thus, it facilitates simulations that explore the full design space at different levels of system abstraction.Comment: This paper has been accepted at IEEE Computer Architecture Letters (CAL

    HVSTO: Efficient Privacy Preserving Hybrid Storage in Cloud Data Center

    Full text link
    In cloud data center, shared storage with good management is a main structure used for the storage of virtual machines (VM). In this paper, we proposed Hybrid VM storage (HVSTO), a privacy preserving shared storage system designed for the virtual machine storage in large-scale cloud data center. Unlike traditional shared storage, HVSTO adopts a distributed structure to preserve privacy of virtual machines, which are a threat in traditional centralized structure. To improve the performance of I/O latency in this distributed structure, we use a hybrid system to combine solid state disk and distributed storage. From the evaluation of our demonstration system, HVSTO provides a scalable and sufficient throughput for the platform as a service infrastructure.Comment: 7 pages, 8 figures, in proceeding of The Second International Workshop on Security and Privacy in Big Data (BigSecurity 2014
    • …
    corecore