Search CORE

14,820 research outputs found

PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems

Author: Chun Byung-Gon
Interlandi Matteo
Lee Yunseong
Santambrogio Marco Domenico
Scolari Alberto
Weimer Markus
Publication venue
Publication date: 01/01/2018
Field of study

Machine Learning models are often composed of pipelines of transformations. While this design allows to efficiently execute single model components at training time, prediction serving has different requirements such as low latency, high throughput and graceful performance degradation under heavy load. Current prediction serving systems consider models as black boxes, whereby prediction-time-specific optimizations are ignored in favor of ease of deployment. In this paper, we present PRETZEL, a prediction serving system introducing a novel white box architecture enabling both end-to-end and multi-model optimizations. Using production-like model pipelines, our experiments show that PRETZEL is able to introduce performance improvements over different dimensions; compared to state-of-the-art approaches PRETZEL is on average able to reduce 99th percentile latency by 5.5x while reducing memory footprint by 25x, and increasing throughput by 4.7x.Comment: 16 pages, 14 figures, 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 201

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Precise Request Tracing and Performance Debugging for Multi-tier Services of Black Boxes

Author: Li Yong
Meng Dan
Sang Bo
Wang Lei
Zhan Jianfeng
Zhang Zhihong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/03/2010
Field of study

As more and more multi-tier services are developed from commercial components or heterogeneous middleware without the source code available, both developers and administrators need a precise request tracing tool to help understand and debug performance problems of large concurrent services of black boxes. Previous work fails to resolve this issue in several ways: they either accept the imprecision of probabilistic correlation methods, or rely on knowledge of protocols to isolate requests in pursuit of tracing accuracy. This paper introduces a tool named PreciseTracer to help debug performance problems of multi-tier services of black boxes. Our contributions are two-fold: first, we propose a precise request tracing algorithm for multi-tier services of black boxes, which only uses application-independent knowledge; secondly, we present a component activity graph abstraction to represent causal paths of requests and facilitate end-to-end performance debugging. The low overhead and tolerance of noise make PreciseTracer a promising tracing tool for using on production systems

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hierarchical programming language for modal multi-rate real-time stream processing applications

Author: Bekooij Marco J.G.
Geuns Stefan J.
Hausmans Joost P.H.M.
Publication venue: IEEE Computer Society
Publication date: 12/09/2014
Field of study

Modal multi-rate stream processing applications with real-time constraints which are executed on multi-core embedded systems often cannot be conveniently specified using current programming languages. An important issue is that sequential programming languages do not allow for convenient programming of multi-rate behavior, whereas parallel programming languages are insufficiently analyzable such that deadlock-freedom and a sufficient throughput cannot be guaranteed.\ud \ud In this paper a programming language is proposed by which a sequential specification of the behavior of an application can be nested in a concurrent specification. Multi-rate behavior can be conveniently expressed using concurrent modules which have well-defined, but restricted interfaces. Complex control behavior can be expressed in the sequential specification of the body of a module. The language is not Turing complete such that a Compositional Temporal Analysis (CTA) model can be derived. It is shown that the CTA model can be used despite the presence of control statements and that the composition of black-box components is possible. Algorithms with a polynomial time complexity can be used to verify whether throughput and latency constraints are met and to determine sufficient buffer capacities.\ud \ud A Phase Alternating Line (PAL) video decoder application is used to demonstrate the applicability of the presented language and analysis approach

University of Twente Research Information

Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels

Author: Hager Georg
Hammer Julian
Laukemann Jan
Wellein Gerhard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/10/2019
Field of study

Useful models of loop kernel runtimes on out-of-order architectures require an analysis of the in-core performance behavior of instructions and their dependencies. While an instruction throughput prediction sets a lower bound to the kernel runtime, the critical path defines an upper bound. Such predictions are an essential part of analytic (i.e., white-box) performance models like the Roofline and Execution-Cache-Memory (ECM) models. They enable a better understanding of the performance-relevant interactions between hardware architecture and loop code. The Open Source Architecture Code Analyzer (OSACA) is a static analysis tool for predicting the execution time of sequential loops. It previously supported only x86 (Intel and AMD) architectures and simple, optimistic full-throughput execution. We have heavily extended OSACA to support ARM instructions and critical path prediction including the detection of loop-carried dependencies, which turns it into a versatile cross-architecture modeling tool. We show runtime predictions for code on Intel Cascade Lake, AMD Zen, and Marvell ThunderX2 micro-architectures based on machine models from available documentation and semi-automatic benchmarking. The predictions are compared with actual measurements.Comment: 6 pages, 3 figure

arXiv.org e-Print Archive

Crossref