385 research outputs found

    High-Performance Cloud Computing: A View of Scientific Applications

    Full text link
    Scientific computing often requires the availability of a massive number of computers for performing large scale experiments. Traditionally, these needs have been addressed by using high-performance computing solutions and installed facilities such as clusters and super computers, which are difficult to setup, maintain, and operate. Cloud computing provides scientists with a completely new model of utilizing the computing infrastructure. Compute resources, storage resources, as well as applications, can be dynamically provisioned (and integrated within the existing infrastructure) on a pay per use basis. These resources can be released when they are no more needed. Such services are often offered within the context of a Service Level Agreement (SLA), which ensure the desired Quality of Service (QoS). Aneka, an enterprise Cloud computing solution, harnesses the power of compute resources by relying on private and public Clouds and delivers to users the desired QoS. Its flexible and service based infrastructure supports multiple programming paradigms that make Aneka address a variety of different scenarios: from finance applications to computational science. As examples of scientific computing in the Cloud, we present a preliminary case study on using Aneka for the classification of gene expression data and the execution of fMRI brain imaging workflow.Comment: 13 pages, 9 figures, conference pape

    A Framework for Approximate Optimization of BoT Application Deployment in Hybrid Cloud Environment

    Get PDF
    We adopt a systematic approach to investigate the efficiency of near-optimal deployment of large-scale CPU-intensive Bag-of-Task applications running on cloud resources with the non-proportional cost to performance ratios. Our analytical solutions perform in both known and unknown running time of the given application. It tries to optimize users' utility by choosing the most desirable tradeoff between the make-span and the total incurred expense. We propose a schema to provide a near-optimal deployment of BoT application regarding users' preferences. Our approach is to provide user with a set of Pareto-optimal solutions, and then she may select one of the possible scheduling points based on her internal utility function. Our framework can cope with uncertainty in the tasks' execution time using two methods, too. First, an estimation method based on a Monte Carlo sampling called AA algorithm is presented. It uses the minimum possible number of sampling to predict the average task running time. Second, assuming that we have access to some code analyzer, code profiling or estimation tools, a hybrid method to evaluate the accuracy of each estimation tool in certain interval times for improving resource allocation decision has been presented. We propose approximate deployment strategies that run on hybrid cloud. In essence, proposed strategies first determine either an estimated or an exact optimal schema based on the information provided from users' side and environmental parameters. Then, we exploit dynamic methods to assign tasks to resources to reach an optimal schema as close as possible by using two methods. A fast yet simple method based on First Fit Decreasing algorithm, and a more complex approach based on the approximation solution of the transformed problem into a subset sum problem. Extensive experiment results conducted on a hybrid cloud platform confirm that our framework can deliver a near optimal solution respecting user's utility function

    Virtual Organization Clusters: Self-Provisioned Clouds on the Grid

    Get PDF
    Virtual Organization Clusters (VOCs) provide a novel architecture for overlaying dedicated cluster systems on existing grid infrastructures. VOCs provide customized, homogeneous execution environments on a per-Virtual Organization basis, without the cost of physical cluster construction or the overhead of per-job containers. Administrative access and overlay network capabilities are granted to Virtual Organizations (VOs) that choose to implement VOC technology, while the system remains completely transparent to end users and non-participating VOs. Unlike alternative systems that require explicit leases, VOCs are autonomically self-provisioned according to configurable usage policies. As a grid computing architecture, VOCs are designed to be technology agnostic and are implementable by any combination of software and services that follows the Virtual Organization Cluster Model. As demonstrated through simulation testing and evaluation of an implemented prototype, VOCs are a viable mechanism for increasing end-user job compatibility on grid sites. On existing production grids, where jobs are frequently submitted to a small subset of sites and thus experience high queuing delays relative to average job length, the grid-wide addition of VOCs does not adversely affect mean job sojourn time. By load-balancing jobs among grid sites, VOCs can reduce the total amount of queuing on a grid to a level sufficient to counteract the performance overhead introduced by virtualization

    Using simple PID-inspired controllers for online resilient resource management of distributed scientific workflows

    Get PDF
    Scientific workflows have become mainstream for conducting large-scale scientific research. As a result, many workflow applications and Workflow Management Systems (WMSs) have been developed as part of the cyberinfrastructure to allow scientists to execute their applications seamlessly on a range of distributed platforms. Although the scientific community has addressed this challenge from both theoretical and practical approaches, failure prediction, detection, and recovery still raise many research questions. In this paper, we propose an approach inspired by the control theory developed as part of autonomic computing to predict failures before they happen, and mitigated them when possible. The proposed approach is inspired on the proportional–integral–derivative controller (PID controller) control loop mechanism, which is widely used in industrial control systems, where the controller will react to adjust its output to mitigate faults. PID controllers aim to detect the possibility of a non-steady state far enough in advance so that an action can be performed to prevent it from happening. To demonstrate the feasibility of the approach, we tackle two common execution faults of large scale data-intensive workflows—data storage overload and memory overflow. We developed a simulator, which implements and evaluates simple standalone PID-inspired controllers to autonomously manage data and memory usage of a data-intensive bioinformatics workflow that consumes/produces over 4.4 TB of data, and requires over 24 TB of memory to run all tasks concurrently. Experimental results obtained via simulation indicate that workflow executions may significantly benefit from the controller-inspired approach, in particular under online and unknown conditions. Simulation results show that nearly-optimal executions (slowdown of 1.01) can be attained when using our proposed method, and faults are detected and mitigated far in advance of their occurrence

    Programming distributed and adaptable autonomous components--the GCM/ProActive framework

    Get PDF
    International audienceComponent-oriented software has become a useful tool to build larger and more complex systems by describing the application in terms of encapsulated, loosely coupled entities called components. At the same time, asynchronous programming patterns allow for the development of efficient distributed applications. While several component models and frameworks have been proposed, most of them tightly integrate the component model with the middleware they run upon. This intertwining is generally implicit and not discussed, leading to entangled, hard to maintain code. This article describes our efforts in the development of the GCM/ProActive framework for providing distributed and adaptable autonomous components. GCM/ProActive integrates a component model designed for execution on large-scale environments, with a programming model based on active objects allowing a high degree of distribution and concurrency. This new integrated model provides a more powerful development, composition, and execution environment than other distributed component frameworks. We illustrate that GCM/ProActive is particularly adapted to the programming of autonomic component systems, and to the integration into a service-oriented environment

    Scientific Workflow Scheduling for Cloud Computing Environments

    Get PDF
    The scheduling of workflow applications consists of assigning their tasks to computer resources to fulfill a final goal such as minimizing total workflow execution time. For this reason, workflow scheduling plays a crucial role in efficiently running experiments. Workflows often have many discrete tasks and the number of different task distributions possible and consequent time required to evaluate each configuration quickly becomes prohibitively large. A proper solution to the scheduling problem requires the analysis of tasks and resources, production of an accurate environment model and, most importantly, the adaptation of optimization techniques. This study is a major step toward solving the scheduling problem by not only addressing these issues but also optimizing the runtime and reducing monetary cost, two of the most important variables. This study proposes three scheduling algorithms capable of answering key issues to solve the scheduling problem. Firstly, it unveils BaRRS, a scheduling solution that exploits parallelism and optimizes runtime and monetary cost. Secondly, it proposes GA-ETI, a scheduler capable of returning the number of resources that a given workflow requires for execution. Finally, it describes PSO-DS, a scheduler based on particle swarm optimization to efficiently schedule large workflows. To test the algorithms, five well-known benchmarks are selected that represent different scientific applications. The experiments found the novel algorithms solutions substantially improve efficiency, reducing makespan by 11% to 78%. The proposed frameworks open a path for building a complete system that encompasses the capabilities of a workflow manager, scheduler, and a cloud resource broker in order to offer scientists a single tool to run computationally intensive applications

    Dispatch: distributed peer-to-peer simulations

    Get PDF
    Recently there has been an increasing demand for efficient mechanisms of carrying out computations that exhibit coarse grained parallelism. Examples of this class of problems include simulations involving Monte Carlo methods, computations where numerous, similar but independent, tasks are performed to solve a large problem or any solution which relies on ensemble averages where a simulation is run under a variety of initial conditions which are then combined to form the result. With the ever increasing complexity of such applications, large amounts of computational power are required over a long period of time. Economic constraints entail deploying specialized hardware to satisfy this ever increasing computing power. We address this issue in Dispatch, a peer-to-peer framework for sharing computational power. In contrast to grid computing and other institution-based CPU sharing systems, Dispatch targets an open environment, one that is accessible to all the users and does not require any sort of membership or accounts, i.e. any machine connected to the Internet can be the part of framework. Dispatch allows dynamic and decentralized organization of these computational resources. It empowers users to utilize heterogeneous computational resources spread across geographic and administrative boundaries to run their tasks in parallel. As a first step, we address a number of challenging issues involved in designing such distributed systems. Some of these issues are forming a decentralized and scalable network of computational resources, finding sufficient number of idle CPUs in the network for participants, allocating simulation tasks in an optimal manner so as to reduce the computation time, allowing new participants to join the system and run their task irrespective of their geographical location and facilitating users to interact with their tasks (pausing, resuming, stopping) in real time and implementing security features for preventing malicious users from compromising the network and remote machines. As a second step, we evaluate the performance of Dispatch on a large-scale network consisting of 10−130 machines. For one particular simulation, we were able to achieve up to 1500 million iterations per second as compared to 10 million iterations per second on one machine. We also test Dispatch over a wide-area network where it is deployed on machines that are geographically apart and belong to different domains
    corecore