141 research outputs found

    Metaheuristic approaches to virtual machine placement in cloud computing: a review

    Get PDF

    Reputation-guided Evolutionary Scheduling Algorithm for Independent Tasks in inter-Clouds Environments

    Get PDF
    Self-adaptation provides software with flexibility to different behaviours (configurations) it incorporates and the (semi-) autonomous ability to switch between these behaviours in response to changes. To empower clouds with the ability to capture and respond to quality feedback provided by users at runtime, we propose a reputation guided genetic scheduling algorithm for independent tasks. Current resource management services consider evolutionary strategies to improve the performance on resource allocation procedures or tasks scheduling algorithms, but they fail to consider the user as part of the scheduling process. Evolutionary computing offers different methods to find a near-optimal solution. In this paper we extended previous work with new optimisation heuristics for the problem of scheduling. We show how reputation is considered as an optimisation metric, and analyse how our metrics can be considered as upper bounds for others in the optimisation algorithm. By experimental comparison, we show our techniques can lead to optimised results.Peer Reviewe

    Reducing Cache Contention On GPUs

    Get PDF
    The usage of Graphics Processing Units (GPUs) as an application accelerator has become increasingly popular because, compared to traditional CPUs, they are more cost-effective, their highly parallel nature complements a CPU, and they are more energy efficient. With the popularity of GPUs, many GPU-based compute-intensive applications (a.k.a., GPGPUs) present significant performance improvement over traditional CPU-based implementations. Caches, which significantly improve CPU performance, are introduced to GPUs to further enhance application performance. However, the effect of caches is not significant for many cases in GPUs and even detrimental for some cases. The massive parallelism of the GPU execution model and the resulting memory accesses cause the GPU memory hierarchy to suffer from significant memory resource contention among threads. One cause of cache contention arises from column-strided memory access patterns that GPU applications commonly generate in many data-intensive applications. When such access patterns are mapped to hardware thread groups, they become memory-divergent instructions whose memory requests are not GPU hardware friendly, resulting in serialized access and performance degradation. Cache contention also arises from cache pollution caused by lines with low reuse. For the cache to be effective, a cached line must be reused before its eviction. Unfortunately, the streaming characteristic of GPGPU workloads and the massively parallel GPU execution model increase the reuse distance, or equivalently reduce reuse frequency of data. In a GPU, the pollution caused by a large reuse distance data is significant. Memory request stall is another contention factor. A stalled Load/Store (LDST) unit does not execute memory requests from any ready warps in the issue stage. This stall prevents the potential hit chances for the ready warps. This dissertation proposes three novel architectural modifications to reduce the contention: 1) contention-aware selective caching detects the memory-divergent instructions caused by the column-strided access patterns, calculates the contending cache sets and locality information and then selectively caches; 2) locality-aware selective caching dynamically calculates the reuse frequency with efficient hardware and caches based on the reuse frequency; and 3) memory request scheduling queues the memory requests from a warp issuing stage, frees the LDST unit stall and schedules items from the queue to the LDST unit by multiple probing of the cache. Through systematic experiments and comprehensive comparisons with existing state-of-the-art techniques, this dissertation demonstrates the effectiveness of our aforementioned techniques and the viability of reducing cache contention through architectural support. Finally, this dissertation suggests other promising opportunities for future research on GPU architecture

    High performance computing applications: Inter-process communication, workflow optimization, and deep learning for computational nuclear physics

    Get PDF
    Various aspects of high performance computing (HPC) are addressed in this thesis. The main focus is on analyzing and suggesting novel ideas to improve an application\u27s performance and scalability on HPC systems and to make the most out of the available computational resources. The choice of inter-process communication is one of the main factors that can influence an application\u27s performance. This study investigates other computational paradigms, such as one-sided communication, that was known to improve the efficiency of current implementation methods. We compare the performance and scalability of the SHMEM and corresponding MPI-3 routines for five different benchmark tests using a Cray XC30. The performance of the MPI-3 get and put operations was evaluated using fence synchronization and also using lock-unlock synchronization. The five tests used communication patterns ranging from light to heavy data traffic: accessing distant messages, circular right shift, gather, broadcast and all-to-all. Each implementation was run using message sizes of 8 bytes, 10 Kbytes and 1 Mbyte and up to 768 processes. For nearly all tests, the SHMEM get and put implementations outperformed the MPI-3 get and put implementations. We noticed significant performance increase using MPI-3 instead of MPI-2 when compared with performance results from previous studies. One can use this performance and scalability analysis to choose the implementation method best suited for a particular application to run on a specific HPC machine. Today\u27s HPC machines are complex and constantly evolving, making it important to be able to easily evaluate the performance and scalability of HPC applications on both existing and new HPC computers. The evaluation of the performance of applications can be time consuming and tedious. HPC-Bench is a general purpose tool used to optimize benchmarking workflow for HPC to aid in the efficient evaluation of performance using multiple applications on an HPC machine with only a click of a button . HPC-Bench allows multiple applications written in different languages, with multiple parallel versions, using multiple numbers of processes/threads to be evaluated. Performance results are put into a database, which is then queried for the desired performance data, and then the R statistical software package is used to generate the desired graphs and tables. The use of HPC-Bench is illustrated with complex applications that were run on the National Energy Research Scientific Computing Center\u27s (NERSC) Edison Cray XC30 HPC computer. With the advancement of HPC machines, one needs efficient algorithms and new tools to make the most out of available computational resources. This work also discusses a novel application of deep learning to a nuclear physics application. In recent years, several successful applications of the artificial neural networks (ANNs) have emerged in nuclear physics and high-energy physics, as well as in biology, chemistry, meteorology, and other fields of science. A major goal of nuclear theory is to predict nuclear structure and nuclear reactions from the underlying theory of the strong interactions, Quantum Chromodynamics (QCD). The nuclear quantum many-body problem is a computationally hard problem to solve. With access to powerful HPC systems, several ab initio approaches, such as the No-Core Shell Model (NCSM), have been developed for approximately solving finite nuclei with realistic strong interactions. However, to accurately solve for the properties of atomic nuclei, one faces immense theoretical and computational challenges. To obtain the nuclear physics observables as close as possible to the exact results, one seeks NCSM solutions in the largest feasible basis spaces. These results obtained in a finite basis, are then used to extrapolate to the infinite basis space limit and thus, obtain results corresponding to the complete basis within evaluated uncertainties. Each observable requires a separate extrapolation and most observables have no proven extrapolation method. We propose a feed-forward ANN method as an extrapolation tool to obtain the ground state energy and the ground state point-proton root-mean-square (rms) radius along with their extrapolation uncertainties. The designed ANNs are sufficient to produce results for these two very different observables in ^6Li from the ab initio NCSM results in small basis spaces that satisfy the following theoretical physics condition: independence of basis space parameters in the limit of extremely large matrices. Comparisons of the ANN results with other extrapolation methods are also provided
    • …
    corecore