219 research outputs found

    Load Balancing in Heterogeneous Cloud Environments by Using PROMETHEE Method

    Get PDF
    Abstract: Efficient Scheduling of tasks in a cloud environment improves resources utilization thereby meeting users' requirements. One of the most important objectives of a scheduling algorithm in cloud environment is a balanced load distribution over various resources for enhancing the overall performance of the cloud. Such a scheduling is complex in nature due to the dynamicity of resources and incoming application specifications. In this paper, we employ PROMETHEE decision making model to design a scheduling algorithm, called PROMETHEE Load Balancing (PLB).This paper formulates the load balancing issue as a multi-criteria decision making problem and aims to achieve well-balanced load across virtual machines for maximizing the overall throughput of the cloud. Extensive simulation results in CloudSim environment show that the proposed algorithm outperforms existing algorithms in terms of load balancing index (LBI), VM load variation, makespan, average execution time and waiting time

    A Framework for Approximate Optimization of BoT Application Deployment in Hybrid Cloud Environment

    Get PDF
    We adopt a systematic approach to investigate the efficiency of near-optimal deployment of large-scale CPU-intensive Bag-of-Task applications running on cloud resources with the non-proportional cost to performance ratios. Our analytical solutions perform in both known and unknown running time of the given application. It tries to optimize users' utility by choosing the most desirable tradeoff between the make-span and the total incurred expense. We propose a schema to provide a near-optimal deployment of BoT application regarding users' preferences. Our approach is to provide user with a set of Pareto-optimal solutions, and then she may select one of the possible scheduling points based on her internal utility function. Our framework can cope with uncertainty in the tasks' execution time using two methods, too. First, an estimation method based on a Monte Carlo sampling called AA algorithm is presented. It uses the minimum possible number of sampling to predict the average task running time. Second, assuming that we have access to some code analyzer, code profiling or estimation tools, a hybrid method to evaluate the accuracy of each estimation tool in certain interval times for improving resource allocation decision has been presented. We propose approximate deployment strategies that run on hybrid cloud. In essence, proposed strategies first determine either an estimated or an exact optimal schema based on the information provided from users' side and environmental parameters. Then, we exploit dynamic methods to assign tasks to resources to reach an optimal schema as close as possible by using two methods. A fast yet simple method based on First Fit Decreasing algorithm, and a more complex approach based on the approximation solution of the transformed problem into a subset sum problem. Extensive experiment results conducted on a hybrid cloud platform confirm that our framework can deliver a near optimal solution respecting user's utility function

    Master/worker parallel discrete event simulation

    Get PDF
    The execution of parallel discrete event simulation across metacomputing infrastructures is examined. A master/worker architecture for parallel discrete event simulation is proposed providing robust executions under a dynamic set of services with system-level support for fault tolerance, semi-automated client-directed load balancing, portability across heterogeneous machines, and the ability to run codes on idle or time-sharing clients without significant interaction by users. Research questions and challenges associated with issues and limitations with the work distribution paradigm, targeted computational domain, performance metrics, and the intended class of applications to be used in this context are analyzed and discussed. A portable web services approach to master/worker parallel discrete event simulation is proposed and evaluated with subsequent optimizations to increase the efficiency of large-scale simulation execution through distributed master service design and intrinsic overhead reduction. New techniques for addressing challenges associated with optimistic parallel discrete event simulation across metacomputing such as rollbacks and message unsending with an inherently different computation paradigm utilizing master services and time windows are proposed and examined. Results indicate that a master/worker approach utilizing loosely coupled resources is a viable means for high throughput parallel discrete event simulation by enhancing existing computational capacity or providing alternate execution capability for less time-critical codes.Ph.D.Committee Chair: Fujimoto, Richard; Committee Member: Bader, David; Committee Member: Perumalla, Kalyan; Committee Member: Riley, George; Committee Member: Vuduc, Richar

    Efficient Synchronization for GPGPU

    Get PDF
    High-performance General Purpose Graphics processing units (GPGPUs) have exposed bottlenecks in synchronizations of threads and cores. The massively parallel computing cores and complex hierarchies of threads present new challenges for synchronizations at different granularities. Performance of GPU is hindered by inefficient global and local synchronizations. I propose hardware-software cooperative frameworks for efficient synchronization of GPGPU to address the following issues. To provide efficient global synchronization (Gsync), an API with direct hardware support is proposed. The GPU cores are synchronized by an on-chip Gsync controller. Partial context switch is employed to guarantee deadlock-free execution. The proposed Gsync avoids expensive API calls and alleviates data thrashing. Prioritized warp scheduling is used to increase the overlap of context switch with kernel execution. To efficiently exploit the inherent parallelism of producer-consumer problems, a flexible wait-signal scheme is proposed at thread-block level. I propose dedicated APIs to express fine-grained static and dynamic dependencies with hardware support. The proposed scheme can accelerate wavefront, graph and machine learning applications. The architectural design of on-chip wait-signal controller eliminates busy wait loop and long-latency memory operations. I also propose thread block dispatch scheduling to address the problem of load imbalance and large context switch overhead. To reduce stall due to synchronizations, a synchronization-aware warp scheduling is proposed to coordinate multiple warp schedulers upon synchronization events. Both performance and hardware utilization are improved by resolving the barrier sooner

    QoS-aware predictive workflow scheduling

    Full text link
    This research places the basis of QoS-aware predictive workflow scheduling. This research novel contributions will open up prospects for future research in handling complex big workflow applications with high uncertainty and dynamism. The results from the proposed workflow scheduling algorithm shows significant improvement in terms of the performance and reliability of the workflow applications

    Mitigation of failures in high performance computing via runtime techniques

    Get PDF
    As machines increase in scale, it is predicted that failure rates of supercomputers will correspondingly increase. Even though the mean time to failure (MTTF) of individual component is high, the large number of components significantly decreases the system MTTF. Meanwhile, the decreasing size of transistors has been critical to the increase in capacity of supercomputers. The smaller the transistors are, silent data corruptions (SDC) are likely to occur more frequently. SDCs do not inhibit execution, but may silently lead to incorrect results. In this thesis, we leverage runtime system and compiler techniques to mitigate a significant fraction of failures automatically with low overhead. The main goals of various system-level fault tolerance strategies designed in this thesis are: reducing the extra cost added to application execution while improving system reliability; automatically adjusting fault tolerance decisions without user intervention based on environmental changes; protecting applications not only from fail-stop failures but also from silent data corruptions. The main contributions of this thesis are development of a semi-blocking checkpoint protocol that overlaps application execution with fault tolerance operation to reduce the overhead of checkpointing, a runtime system technique for automatic checkpoint and restart without user intervention, a holistic framework (ACR) for automatically detecting and recovering from silent data corruptions and a framework called FlipBack that provides targeted protection against silent data corruption with low cost

    Efficient data reconfiguration for today's cloud systems

    Get PDF
    Performance of big data systems largely relies on efficient data reconfiguration techniques. Data reconfiguration operations deal with changing configuration parameters that affect data layout in a system. They could be user-initiated like changing shard key, block size in NoSQL databases, or system-initiated like changing replication in distributed interactive analytics engine. Current data reconfiguration schemes are heuristics at best and often do not scale well as data volume grows. As a result, system performance suffers. In this thesis, we show that {\it data reconfiguration mechanisms can be done in the background by using new optimal or near-optimal algorithms coupling them with performant system designs}. We explore four different data reconfiguration operations affecting three popular types of systems -- storage, real-time analytics and batch analytics. In NoSQL databases (storage), we explore new strategies for changing table-level configuration and for compaction as they improve read/write latencies. In distributed interactive analytics engines, a good replication algorithm can save costs by judiciously using memory that is sufficient to provide the highest throughput and low latency for queries. Finally, in batch processing systems, we explore prefetching and caching strategies that can improve the number of production jobs meeting their SLOs. All these operations happen in the background without affecting the fast path. Our contributions in each of the problems are two-fold -- 1) we model the problem and design algorithms inspired from well-known theoretical abstractions, 2) we design and build a system on top of popular open source systems used in companies today. Finally, using real-life workloads, we evaluate the efficacy of our solutions. Morphus and Parqua provide several 9s of availability while changing table level configuration parameters in databases. By halving memory usage in distributed interactive analytics engine, Getafix reduces cost of deploying the system by 10 million dollars annually and improves query throughput. We are the first to model the problem of compaction and provide formal bounds on their runtime. Finally, NetCachier helps 30\% more production jobs to meet their SLOs compared to existing state-of-the-art

    Doctor of Philosophy

    Get PDF
    dissertationThe increase in computational power of supercomputers is enabling complex scientific phenomena to be simulated at ever-increasing resolution and fidelity. With these simulations routinely producing large volumes of data, performing efficient I/O at this scale has become a very difficult task. Large-scale parallel writes are challenging due to the complex interdependencies between I/O middleware and hardware. Analytic-appropriate reads are traditionally hindered by bottlenecks in I/O access. Moreover, the two components of I/O, data generation from simulations (writes) and data exploration for analysis and visualization (reads), have substantially different data access requirements. Parallel writes, performed on supercomputers, often deploy aggregation strategies to permit large-sized contiguous access. Analysis and visualization tasks, usually performed on computationally modest resources, require fast access to localized subsets or multiresolution representations of the data. This dissertation tackles the problem of parallel I/O while bridging the gap between large-scale writes and analytics-appropriate reads. The focus of this work is to develop an end-to-end adaptive-resolution data movement framework that provides efficient I/O, while supporting the full spectrum of modern HPC hardware. This is achieved by developing technology for highly scalable and tunable parallel I/O, applicable to both traditional parallel data formats and multiresolution data formats, which are directly appropriate for analysis and visualization. To demonstrate the efficacy of the approach, a novel library (PIDX) is developed that is highly tunable and capable of adaptive-resolution parallel I/O to a multiresolution data format. Adaptive resolution storage and I/O, which allows subsets of a simulation to be accessed at varying spatial resolutions, can yield significant improvements to both the storage performance and I/O time. The library provides a set of parameters that controls the storage format and the nature of data aggregation across he network; further, a machine learning-based model is constructed that tunes these parameters for the maximum throughput. This work is empirically demonstrated by showing parallel I/O scaling up to 768K cores within a framework flexible enough to handle adaptive resolution I/O

    Energy-aware scheduling in distributed computing systems

    Get PDF
    Distributed computing systems, such as data centers, are key for supporting modern computing demands. However, the energy consumption of data centers has become a major concern over the last decade. Worldwide energy consumption in 2012 was estimated to be around 270 TWh, and grim forecasts predict it will quadruple by 2030. Maximizing energy efficiency while also maximizing computing efficiency is a major challenge for modern data centers. This work addresses this challenge by scheduling the operation of modern data centers, considering a multi-objective approach for simultaneously optimizing both efficiency objectives. Multiple data center scenarios are studied, such as scheduling a single data center and scheduling a federation of several geographically-distributed data centers. Mathematical models are formulated for each scenario, considering the modeling of their most relevant components such as computing resources, computing workload, cooling system, networking, and green energy generators, among others. A set of accurate heuristic and metaheuristic algorithms are designed for addressing the scheduling problem. These scheduling algorithms are comprehensively studied, and compared with each other, using statistical tools to evaluate their efficacy when addressing realistic workloads and scenarios. Experimental results show the designed scheduling algorithms are able to significantly increase the energy efficiency of data centers when compared to traditional scheduling methods, while providing a diverse set of trade-off solutions regarding the computing efficiency of the data center. These results confirm the effectiveness of the proposed algorithmic approaches for data center infrastructures.Los sistemas informáticos distribuidos, como los centros de datos, son clave para satisfacer la demanda informática moderna. Sin embargo, su consumo de energético se ha convertido en una gran preocupación. Se estima que mundialmente su consumo energético rondó los 270 TWh en el año 2012, y algunos prevén que este consumo se cuadruplicará para el año 2030. Maximizar simultáneamente la eficiencia energética y computacional de los centros de datos es un desafío crítico. Esta tesis aborda dicho desafío mediante la planificación de la operativa del centro de datos considerando un enfoque multiobjetivo para optimizar simultáneamente ambos objetivos de eficiencia. En esta tesis se estudian múltiples variantes del problema, desde la planificación de un único centro de datos hasta la de una federación de múltiples centros de datos geográficmentea distribuidos. Para esto, se formulan modelos matemáticos para cada variante del problema, modelado sus componentes más relevantes, como: recursos computacionales, carga de trabajo, refrigeración, redes, energía verde, etc. Para resolver el problema de planificación planteado, se diseñan un conjunto de algoritmos heurísticos y metaheurísticos. Estos son estudiados exhaustivamente y su eficiencia es evaluada utilizando una batería de herramientas estadísticas. Los resultados experimentales muestran que los algoritmos de planificación diseñados son capaces de aumentar significativamente la eficiencia energética de un centros de datos en comparación con métodos tradicionales planificación. A su vez, los métodos propuestos proporcionan un conjunto diverso de soluciones con diferente nivel de compromiso respecto a la eficiencia computacional del centro de datos. Estos resultados confirman la eficacia del enfoque algorítmico propuesto
    corecore