827 research outputs found

    Contextual Bandit Modeling for Dynamic Runtime Control in Computer Systems

    Get PDF
    Modern operating systems and microarchitectures provide a myriad of mechanisms for monitoring and affecting system operation and resource utilization at runtime. Dynamic runtime control of these mechanisms can tailor system operation to the characteristics and behavior of the current workload, resulting in improved performance. However, developing effective models for system control can be challenging. Existing methods often require extensive manual effort, computation time, and domain knowledge to identify relevant low-level performance metrics, relate low-level performance metrics and high-level control decisions to workload performance, and to evaluate the resulting control models. This dissertation develops a general framework, based on the contextual bandit, for describing and learning effective models for runtime system control. Random profiling is used to characterize the relationship between workload behavior, system configuration, and performance. The framework is evaluated in the context of two applications of progressive complexity; first, the selection of paging modes (Shadow Paging, Hardware-Assisted Page) in the Xen virtual machine memory manager; second, the utilization of hardware memory prefetching for multi-core, multi-tenant workloads with cross-core contention for shared memory resources, such as the last-level cache and memory bandwidth. The resulting models for both applications are competitive in comparison to existing runtime control approaches. For paging mode selection, the resulting model provides equivalent performance to the state of the art while substantially reducing the computation requirements of profiling. For hardware memory prefetcher utilization, the resulting models are the first to provide dynamic control for hardware prefetchers using workload statistics. Finally, a correlation-based feature selection method is evaluated for identifying relevant low-level performance metrics related to hardware memory prefetching

    Faster inference from state space models via GPU computing

    Get PDF
    Funding: C.F.-J. is funded via a doctoral scholarship from the University of St Andrews, School of Mathematics and Statistics.Inexpensive Graphics Processing Units (GPUs) offer the potential to greatly speed up computation by employing their massively parallel architecture to perform arithmetic operations more efficiently. Population dynamics models are important tools in ecology and conservation. Modern Bayesian approaches allow biologically realistic models to be constructed and fitted to multiple data sources in an integrated modelling framework based on a class of statistical models called state space models. However, model fitting is often slow, requiring hours to weeks of computation. We demonstrate the benefits of GPU computing using a model for the population dynamics of British grey seals, fitted with a particle Markov chain Monte Carlo algorithm. Speed-ups of two orders of magnitude were obtained for estimations of the log-likelihood, compared to a traditional ‘CPU-only’ implementation, allowing for an accurate method of inference to be used where this was previously too computationally expensive to be viable. GPU computing has enormous potential, but one barrier to further adoption is a steep learning curve, due to GPUs' unique hardware architecture. We provide a detailed description of hardware and software setup, and our case study provides a template for other similar applications. We also provide a detailed tutorial-style description of GPU hardware architectures, and examples of important GPU-specific programming practices.Publisher PDFPeer reviewe

    Computation of risk measures in finance and parallel real-time scheduling

    Get PDF
    Many application areas employ various risk measures, such as a quantile, to assess risks. For example, in finance, risk managers employ a quantile to help determine appropriate levels of capital needed to be able to absorb (with high probability) large unexpected losses in credit portfolios comprising loans, bonds, and other financial instruments subject to default. This dissertation discusses the computation of risk measures in finance and parallel real-time scheduling. Firstly, two estimation approaches are compared for one risk measure, a quantile, via randomized quasi-Monte Carlo (RQMC) in an asymptotic setting where the number of randomizations for RQMC grows large, but the size of the low-discrepancy point set remains fixed. In the first method, for each randomization, it computes an estimator of the cumulative distribution function (CDF), which is inverted to obtain a quantile estimator, and the overall quantile estimator is the sample average of the quantile estimators across randomizations. The second approach instead computes a single quantile estimator by inverting one CDF estimator across all randomizations. Because quantile estimators are generally biased, the first method leads to an estimator that does not converge to the true quantile as the number of randomizations goes to infinity. In contrast, the second estimator does, and a central limit theorem is established for it. To get an improvement, we use conditional Monte Carlo (CMC) to obtain a smoother estimate of the distribution function, and we combine this with the second RQMC to further reduce the variance. The result is a much more accurate quantile estimator, whose mean square error can converge even faster than the canonical rate of O(1/n). Secondly, another risk measure is estimated, namely economic capital (EC), which is defined as the difference between a quantile and the mean of the loss distribution, given a stochastic model for a portfolio’s loss over a given time horizon. This work applies measure-specific importance sampling to separately estimate the two components of the EC, which can lead to a much smaller variance than when estimating both terms simultaneously. Finally, for parallel real-time tasks, the federated scheduling paradigm, which assigns each parallel task a set of dedicated cores, achieves good theoretical bounds by ensuring exclusive use of processing resources to reduce interferences. However, because cores share the last-level cache and memory bandwidth resources, in practice tasks may still interfere with each other despite executing on dedicated cores. To tackle this issue, this work presents a holistic resource allocation framework for parallel real-time tasks under federated scheduling. Under the proposed framework, in addition to dedicated cores, each parallel task is also assigned with dedicated cache and memory bandwidth resources. This work also shows the study of the characteristics of parallel tasks upon different resource allocations following a measurement-based approach and proposes a technique to handle the challenge of tremendous profiling for all resource allocation combinations under this approach. Further, it proposes a holistic resource allocation algorithm that well balances the allocation between different resources to achieve good schedulability. Additionally, this work provides a full implementation of the framework by extending the federated scheduling system with Intel’s Cache Allocation Technology and MemGuard. It also demonstrates the practicality of the proposed framework via extensive numerical evaluations and empirical experiments using real benchmark programs. In the end, the discussion about the application of risk measures for real-time scheduling is given for future work

    Pac-Sim: Simulation of Multi-threaded Workloads using Intelligent, Live Sampling

    Full text link
    High-performance, multi-core processors are the key to accelerating workloads in several application domains. To continue to scale performance at the limit of Moore's Law and Dennard scaling, software and hardware designers have turned to dynamic solutions that adapt to the needs of applications in a transparent, automatic way. For example, modern hardware improves its performance and power efficiency by changing the hardware configuration, like the frequency and voltage of cores, according to a number of parameters such as the technology used, the workload running, etc. With this level of dynamism, it is essential to simulate next-generation multi-core processors in a way that can both respond to system changes and accurately determine system performance metrics. Currently, no sampled simulation platform can achieve these goals of dynamic, fast, and accurate simulation of multi-threaded workloads. In this work, we propose a solution that allows for fast, accurate simulation in the presence of both hardware and software dynamism. To accomplish this goal, we present Pac-Sim, a novel sampled simulation methodology for fast, accurate sampled simulation that requires no upfront analysis of the workload. With our proposed methodology, it is now possible to simulate long-running dynamically scheduled multi-threaded programs with significant simulation speedups even in the presence of dynamic hardware events. We evaluate Pac-Sim using the multi-threaded SPEC CPU2017, NPB, and PARSEC benchmarks with both static and dynamic thread scheduling. The experimental results show that Pac-Sim achieves a very low sampling error of 1.63% and 3.81% on average for statically and dynamically scheduled benchmarks, respectively. Pac-Sim also demonstrates significant simulation speedups as high as 523.5×\times (210.3×\times on average) for the train input set of SPEC CPU2017.Comment: 14 pages, 14 figure

    Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

    Full text link
    General purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplyfing the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best-practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks, and make the investment of time and effort to become early adopters of GPGPU in astronomy, stand to reap great benefits.Comment: 13 pages, 5 figures, accepted for publication in PAS

    A review of High Performance Computing foundations for scientists

    Full text link
    The increase of existing computational capabilities has made simulation emerge as a third discipline of Science, lying midway between experimental and purely theoretical branches [1, 2]. Simulation enables the evaluation of quantities which otherwise would not be accessible, helps to improve experiments and provides new insights on systems which are analysed [3-6]. Knowing the fundamentals of computation can be very useful for scientists, for it can help them to improve the performance of their theoretical models and simulations. This review includes some technical essentials that can be useful to this end, and it is devised as a complement for researchers whose education is focused on scientific issues and not on technological respects. In this document we attempt to discuss the fundamentals of High Performance Computing (HPC) [7] in a way which is easy to understand without much previous background. We sketch the way standard computers and supercomputers work, as well as discuss distributed computing and discuss essential aspects to take into account when running scientific calculations in computers.Comment: 33 page
    • 

    corecore