16 research outputs found

    Enhancing performance prediction robustness by combining analytical modeling and machine learning

    No full text
    Classical approaches to performance prediction rely on two, typically antithetic, techniques: Machine Learning (ML) and Analytical Modeling (AM). ML takes a black box ap- proach, whose accuracy strongly depends on the represen- tativeness of the dataset used during the initial training phase. Specifically, it can achieve very good accuracy in areas of the features' space that have been sufficiently ex- plored during the training process. Conversely, AM tech- niques require no or minimal training, hence exhibiting the potential for supporting prompt instantiation of the perfor- mance model of the target system. However, in order to ensure their tractability, they typically rely on a set of sim- plifying assumptions. Consequently, AM's accuracy can be seriously challenged in scenarios (e.g., workload conditions) in which such assumptions are not matched. In this paper we explore several hybrid/gray box techniques that exploit AM and ML in synergy in order to get the best of the two worlds. We evaluate the proposed techniques in case stud- ies targeting two complex and widely adopted middleware systems: a NoSQL distributed key-value store and a Total Order Broadcast (TOB) service. Copyright © 2015 ACM

    Managing Tail Latency in Datacenter-Scale File Systems Under Production Constraints

    No full text
    Distributed file systems often exhibit high tail latencies, especially in large-scale datacenters and in the presence of competing (and possibly higher priority) workloads. This paper introduces techniques for managing tail latencies in these systems, while addressing the practical challenges inherent in production datacenters (e.g., hardware heterogeneity, interference from other workloads, the need to maximize simplicity and maintainability). We implement our techniques in a scalable distributed file system (an extension of HDFS) used in production at Microsoft. Our evaluation uses 70k servers in 3 datacenters, and shows that our techniques reduce tail latency significantly for production workloads
    corecore