41 research outputs found

    Predicting Software Performance with Divide-and-Learn

    Full text link
    Predicting the performance of highly configurable software systems is the foundation for performance testing and quality assurance. To that end, recent work has been relying on machine/deep learning to model software performance. However, a crucial yet unaddressed challenge is how to cater for the sparsity inherited from the configuration landscape: the influence of configuration options (features) and the distribution of data samples are highly sparse. In this paper, we propose an approach based on the concept of 'divide-and-learn', dubbed DaLDaL. The basic idea is that, to handle sample sparsity, we divide the samples from the configuration landscape into distant divisions, for each of which we build a regularized Deep Neural Network as the local model to deal with the feature sparsity. A newly given configuration would then be assigned to the right model of division for the final prediction. Experiment results from eight real-world systems and five sets of training data reveal that, compared with the state-of-the-art approaches, DaLDaL performs no worse than the best counterpart on 33 out of 40 cases (within which 26 cases are significantly better) with up to 1.94×1.94\times improvement on accuracy; requires fewer samples to reach the same/better accuracy; and producing acceptable training overhead. Practically, DaLDaL also considerably improves different global models when using them as the underlying local models, which further strengthens its flexibility. To promote open science, all the data, code, and supplementary figures of this work can be accessed at our repository: https://github.com/ideas-labo/DaL.Comment: This paper has been accepted by The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 202

    POSE : A mathematical and visual modelling tool to guide energy aware code optimisation

    Get PDF
    Performance engineers are beginning to explore software-level optimisation as a means to reduce the energy consumed when running their codes. This paper presents POSE, a mathematical and visual modelling tool which highlights the relationship between runtime and power consumption. POSE allows developers to assess whether power optimisation is worth pursuing for their codes. We demonstrate POSE by studying the power optimisation characteristics of applications from the Mantevo and Rodinia benchmark suites. We show that LavaMD has the most scope for CPU power optimisation, with improvements in Energy Delay Squared Product (ED2P) of up to 30.59%. Conversely, MiniMD offers the least scope, with improvements to the same metric limited to 7.60%. We also show that no power optimised version of MiniMD operating below 2.3 GHz can match the ED2P performance of the original code running at 3.2 GHz. For LavaMD this limit is marginally less restrictive at 2.2 GHz

    Simulation Modelling of Cloud Mini and Mega Data Centers Using Cloud Analyst

    Get PDF
    Cloud Computing has now become a base technology for various other technologies including Internet of Things, Big Data Technologies and many other technologies, the responsibility of Cloud become critical in case of real time applications where the cloud services are required in real time. Delay in the response from Cloud may lead to serious consequences even loss of lives where the processes data from cloud must reach within predefined time interval. The performance of Cloud has experienced delays with the current infrastructure due to multiple issues in Traditional Cloud Network Model. The Paper suggests a proposed architecture Cloud Mini Data Centers simulated using Cloud Analyst to minimize the delays of Cloud Service delivery. The paper also simulate traditional cloud Network model using Cloud Analyst and provides a comparative study of both models

    Revisiting the high-performance reconfigurable computing for future datacenters

    Get PDF
    Modern datacenters are reinforcing the computational power and energy efficiency by assimilating field programmable gate arrays (FPGAs). The sustainability of this large-scale integration depends on enabling multi-tenant FPGAs. This requisite amplifies the importance of communication architecture and virtualization method with the required features in order to meet the high-end objective. Consequently, in the last decade, academia and industry proposed several virtualization techniques and hardware architectures for addressing resource management, scheduling, adoptability, segregation, scalability, performance-overhead, availability, programmability, time-to-market, security, and mainly, multitenancy. This paper provides an extensive survey covering three important aspects-discussion on non-standard terms used in existing literature, network-on-chip evaluation choices as a mean to explore the communication architecture, and virtualization methods under latest classification. The purpose is to emphasize the importance of choosing appropriate communication architecture, virtualization technique and standard language to evolve the multi-tenant FPGAs in datacenters. None of the previous surveys encapsulated these aspects in one writing. Open problems are indicated for scientific community as well

    Energy-efficient computing for HPC workloads on heterogeneous manycore chips

    Full text link
    Power and energy efficiency is one of the major challenges to achieve exascale computing in the next several years. While chips operating at low voltages have been studied to be highly energy-efficient, low voltage operations lead to heterogeneity across cores within the microprocessor chip. In this work, we study chips with low voltage operation and discuss programming systems, and performance modeling in the presence of heterogeneity. We propose an integer linear programming based approach for selecting optimal configu-ration of a chip that minimizes its energy consumption. We obtain an average of 26 % and 10.7 % savings in energy con-sumption of the chip for two HPC mini-applications- min-iMD and Jacobi, respectively. We also evaluate the energy savings with execution time constraints, using the proposed approach. These energy savings are significantly more than the savings by sub-optimal configurations obtained from heuristics

    Multithreading opportunities for program optimizations

    Get PDF
    The introduction of Multiprocessor On Chip (CMP) led to a substantial reformulation of the Moore law stating that the number of cores in a single chip doubles every one year and a half. The tech boom related to CMP gave a strong impulse to parallel program design diminishing its ``gap'' with parallel architectures. Nowadays a leading trend related to high performance products is represented by CMP with multithreading CPU nodes. Basically the CPU multithreading feature tries to overcome the underutilization of superscalar processors, due to the lack of exploitable instruction level parallelism (ILP), allowing the simultaneous processing of different programs during the same time slot. In multithreading architectures a thread is a concurrent computational entity supported directly at firmware level (these threads are usually called hardware threads). Multithreading technology opens a broad range of possible optimizations that can be applied to improve the performance of sequential and parallel applications. This thesis treat four possible optimization targeted for multithreading architectures: Speculative Precomputation, Threaded Multipath Execution, Speculative Multithreading and Communication threads. L'introduzione dei Multiprocessor On Chip (CMP) ha portato ad una sostanziale riformulazione della legge di Moore la quale afferma che il numero di cores in un singolo chip raddoppia ogni anno e mezzo. Il boom tecnologico relativo ai CMP ha dato un grande impulso al design relativo alla programmazione parallela diminuendo il gap con le architetture parallele. Allo stato attuale delle cose, un trend prominente relativo ai prodotti di high performance computing è rappresentato da CMP con nodi caratterizzati da hardware multithreading. Questa tecnologia prova a risolvere il sottoutilizzo di processori superscalari, dovuto alla mancanza di ILP (instruction level parallelism), permettendo la computazione simultanea di diversi programmi durante lo stesso time slot La tecnologia multithreading ha aperto un ampio spettro di possibili ottimizzazioni che possono essere utilizzate al fine di migliorare le performance di applicazioni sequenziali e parallele. Questa tesi tratta quattro possibili ottimizzazioni indirizzate per architetture multithreading: Speculative Precomputation (Helper Thread), Threaded Multipath Execution, Speculative Multithreading and Communication Threads

    Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget

    Full text link
    Abstract—Building future generation supercomputers while constraining their power consumption is one of the biggest challenges faced by the HPC community. For example, US Department of Energy has set a goal of 20 MW for an exascale (1018 flops) supercomputer. To realize this goal, a lot of research is being done to revolutionize hardware design to build power efficient computers and network interconnects. In this work, we propose a software-based online resource management system that leverages hardware facilitated capability to constrain the power consumption of each node in order to optimally allocate power and nodes to a job. Our scheme uses this hardware capability in conjunction with an adaptive runtime system that can dynamically change the resource configuration of a running job allowing our resource manager to re-optimize allocation decisions to running jobs as new jobs arrive, or a running job terminates. We also propose a performance modeling scheme that esti-mates the essential power characteristics of a job at any scale. The proposed online resource manager uses these performance characteristics for making scheduling and resource allocation decisions that maximize the job throughput of the supercomputer under a given power budget. We demonstrate the benefits of our approach by using a mix of jobs with different power-response characteristics. We show that with a power budget of 4.75 MW, we can obtain up to 5.2X improvement in job throughput when compared with the SLURM scheduling policy that is power-unaware. We corroborate our results with real experiments on a relatively small scale cluster, in which we obtain a 1.7X improvement. I

    Analytical Modeling is Enough for High Performance BLIS

    Get PDF
    We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning parameters for high-end instantiations of the matrix-matrix multiplication. This is of both practical and scientific importance, as it greatly reduces the development effort required for the implementation of the level-3 BLAS while also advancing our understanding of how hierarchically layered memories interact with high-performance software. This allows the community to move on from valuable engineering solutions (empirically autotuning) to scientific understanding (analytical insight).This research was sponsored in part by NSF grants ACI-1148125/1340293 and CCF-0917167. Enrique S. Quintana-Ortí was supported by project TIN2011-23283 of the Ministerio de Ciencia e Innovacióon and FEDER. Francisco D. Igual was supported by project TIN2012-32180 of the Ministerio de Ciencia e Innovación
    corecore