64,292 research outputs found

    A Holistic Approach to Log Data Analysis in High-Performance Computing Systems: The Case of IBM Blue Gene/Q

    Get PDF
    The complexity and cost of managing high-performance computing infrastructures are on the rise. Automating management and repair through predictive models to minimize human interventions is an attempt to increase system availability and contain these costs. Building predictive models that are accurate enough to be useful in automatic management cannot be based on restricted log data from subsystems but requires a holistic approach to data analysis from disparate sources. Here we provide a detailed multi-scale characterization study based on four datasets reporting power consumption, temperature, workload, and hardware/software events for an IBM Blue Gene/Q installation. We show that the system runs a rich parallel workload, with low correlation among its components in terms of temperature and power, but higher correlation in terms of events. As expected, power and temperature correlate strongly, while events display negative correlations with load and power. Power and workload show moderate correlations, and only at the scale of components. The aim of the study is a systematic, integrated characterization of the computing infrastructure and discovery of correlation sources and levels to serve as basis for future predictive modeling efforts.Comment: 12 pages, 7 Figure

    ShenZhen transportation system (SZTS): a novel big data benchmark suite

    Get PDF
    Data analytics is at the core of the supply chain for both products and services in modern economies and societies. Big data workloads, however, are placing unprecedented demands on computing technologies, calling for a deep understanding and characterization of these emerging workloads. In this paper, we propose ShenZhen Transportation System (SZTS), a novel big data Hadoop benchmark suite comprised of real-life transportation analysis applications with real-life input data sets from Shenzhen in China. SZTS uniquely focuses on a specific and real-life application domain whereas other existing Hadoop benchmark suites, such as HiBench and CloudRank-D, consist of generic algorithms with synthetic inputs. We perform a cross-layer workload characterization at the microarchitecture level, the operating system (OS) level, and the job level, revealing unique characteristics of SZTS compared to existing Hadoop benchmarks as well as general-purpose multi-core PARSEC benchmarks. We also study the sensitivity of workload behavior with respect to input data size, and we propose a methodology for identifying representative input data sets

    Activity Based Costing techniques for workload characterization.

    Get PDF
    This paper addresses the problem of non-captured service demands in workload monitoring data. Capture ratios are the coefficients that correct the workload service demands so that they fit the global system monitoring data. This paper proposes new techniques for the determination of capture ratios by means of Activity Based Costing techniques. The techniques are illustrated by means of a case study, which also illustrates the non-trivial nature of capture ratios in practical performance analysis.Activity based costing;

    Mechanistic modeling of architectural vulnerability factor

    Get PDF
    Reliability to soft errors is a significant design challenge in modern microprocessors owing to an exponential increase in the number of transistors on chip and the reduction in operating voltages with each process generation. Architectural Vulnerability Factor (AVF) modeling using microarchitectural simulators enables architects to make informed performance, power, and reliability tradeoffs. However, such simulators are time-consuming and do not reveal the microarchitectural mechanisms that influence AVF. In this article, we present an accurate first-order mechanistic analytical model to compute AVF, developed using the first principles of an out-of-order superscalar execution. This model provides insight into the fundamental interactions between the workload and microarchitecture that together influence AVF. We use the model to perform design space exploration, parametric sweeps, and workload characterization for AVF

    Workload characterization of JVM languages

    Get PDF
    Being developed with a single language in mind, namely Java, the Java Virtual Machine (JVM) nowadays is targeted by numerous programming languages. Automatic memory management, Just-In-Time (JIT) compilation, and adaptive optimizations provided by the JVM make it an attractive target for different language implementations. Even though being targeted by so many languages, the JVM has been tuned with respect to characteristics of Java programs only -- different heuristics for the garbage collector or compiler optimizations are focused more on Java programs. In this dissertation, we aim at contributing to the understanding of the workloads imposed on the JVM by both dynamically-typed and statically-typed JVM languages. We introduce a new set of dynamic metrics and an easy-to-use toolchain for collecting the latter. We apply our toolchain to applications written in six JVM languages -- Java, Scala, Clojure, Jython, JRuby, and JavaScript. We identify differences and commonalities between the examined languages and discuss their implications. Moreover, we have a close look at one of the most efficient compiler optimizations - method inlining. We present the decision tree of the HotSpot JVM's JIT compiler and analyze how well the JVM performs in inlining the workloads written in different JVM languages

    Exact asymptotics for fluid queues fed by multiple heavy-tailed on-off flows

    Get PDF
    We consider a fluid queue fed by multiple On-Off flows with heavy-tailed (regularly varying) On periods. Under fairly mild assumptions, we prove that the workload distribution is asymptotically equivalent to that in a reduced system. The reduced system consists of a ``dominant'' subset of the flows, with the original service rate subtracted by the mean rate of the other flows. We describe how a dominant set may be determined from a simple knapsack formulation. The dominant set consists of a ``minimally critical'' set of On-Off flows with regularly varying On periods. In case the dominant set contains just a single On-Off flow, the exact asymptotics for the reduced system follow from known results. For the case of several On-Off flows, we exploit a powerful intuitive argument to obtain the exact asymptotics. Combined with the reduced-load equivalence, the results for the reduced system provide a characterization of the tail of the workload distribution for a wide range of traffic scenarios

    Workload Characterization of AFS File Servers

    Full text link
    This paper describes the workload characterization of AFS file servers, based on traces collected by the file servers at CITI over a 2-4 week period. These workload characteristics have been used to compare the performance of servers running on different hardware. There are two parts to this paper. In the first part, we describe a server model that will be used to drive a synthetic workload. In the second part, we build a client model consisting of the requests made by different user types. We show that the user community can be broken into distinct types, where all the users of a specific type exhibit similar request patterns. This clustering information is used in conjunction with the workload characteristics at the server to predict loads on servers, given the community that needs to be served. Finally, we give an example to show how this can be done.http://deepblue.lib.umich.edu/bitstream/2027.42/107962/1/citi-tr-92-6.pd
    corecore