102 research outputs found

    State-of-the-Art in Parallel Computing with R

    R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly useful for general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems four different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix

    Virtual Organization Clusters: Self-Provisioned Clouds on the Grid

    Virtual Organization Clusters (VOCs) provide a novel architecture for overlaying dedicated cluster systems on existing grid infrastructures. VOCs provide customized, homogeneous execution environments on a per-Virtual Organization basis, without the cost of physical cluster construction or the overhead of per-job containers. Administrative access and overlay network capabilities are granted to Virtual Organizations (VOs) that choose to implement VOC technology, while the system remains completely transparent to end users and non-participating VOs. Unlike alternative systems that require explicit leases, VOCs are autonomically self-provisioned according to configurable usage policies. As a grid computing architecture, VOCs are designed to be technology agnostic and are implementable by any combination of software and services that follows the Virtual Organization Cluster Model. As demonstrated through simulation testing and evaluation of an implemented prototype, VOCs are a viable mechanism for increasing end-user job compatibility on grid sites. On existing production grids, where jobs are frequently submitted to a small subset of sites and thus experience high queuing delays relative to average job length, the grid-wide addition of VOCs does not adversely affect mean job sojourn time. By load-balancing jobs among grid sites, VOCs can reduce the total amount of queuing on a grid to a level sufficient to counteract the performance overhead introduced by virtualization

    Survey and Analysis of Production Distributed Computing Infrastructures

    This report has two objectives. First, we describe a set of the production distributed infrastructures currently available, so that the reader has a basic understanding of them. This includes explaining why each infrastructure was created and made available and how it has succeeded and failed. The set is not complete, but we believe it is representative. Second, we describe the infrastructures in terms of their use, which is a combination of how they were designed to be used and how users have found ways to use them. Applications are often designed and created with specific infrastructures in mind, with both an appreciation of the existing capabilities provided by those infrastructures and an anticipation of their future capabilities. Here, the infrastructures we discuss were often designed and created with specific applications in mind, or at least specific types of applications. The reader should understand how the interplay between the infrastructure providers and the users leads to such usages, which we call usage modalities. These usage modalities are really abstractions that exist between the infrastructures and the applications; they influence the infrastructures by representing the applications, and they influence the ap- plications by representing the infrastructures

    Integrating multiple clusters for compute-intensive applications

    Multicluster grids provide one promising solution to satisfying the growing computational demands of compute-intensive applications. However, it is challenging to seamlessly integrate all participating clusters in different domains into a single virtual computational platform. In order to fully utilize the capabilities of multicluster grids, computer scientists need to deal with the issue of joining together participating autonomic systems practically and efficiently to execute grid-enabled applications. Driven by several compute-intensive applications, this theses develops a multicluster grid management toolkit called Pelecanus to bridge the gap between user\u27s needs and the system\u27s heterogeneity. Application scientists will be able to conduct very large-scale execution across multiclusters with transparent QoS assurance. A novel model called DA-TC (Dynamic Assignment with Task Containers) is developed and is integrated into Pelecanus. This model uses the concept of a task container that allows one to decouple resource allocation from resource binding. It employs static load balancing for task container distribution and dynamic load balancing for task assignment. The slowest resources become useful rather than be bottlenecks in this manner. A cluster abstraction is implemented, which not only provides various cluster information for the DA-TC execution model, but also can be used as a standalone toolkit to monitor and evaluate the clusters\u27 functionality and performance. The performance of the proposed DA-TC model is evaluated both theoretically and experimentally. Results demonstrate the importance of reducing queuing time in decreasing the total turnaround time for an application. Experiments were conducted to understand the performance of various aspects of the DA-TC model. Experiments showed that our model could significantly reduce turnaround time and increase resource utilization for our targeted application scenarios. Four applications are implemented as case studies to determine the applicability of the DA-TC model. In each case the turnaround time is greatly reduced, which demonstrates that the DA-TC model is efficient for assisting application scientists in conducting their research. In addition, virtual resources were integrated into the DA-TC model for application execution. Experiments show that the execution model proposed in this thesis can work seamlessly with multiple hybrid grid/cloud resources to achieve reduced turnaround time

    Cloud Computing and Grid Computing 360-Degree Compared

    Cloud Computing has become another buzzword after Web 2.0. However, there are dozens of different definitions for Cloud Computing and there seems to be no consensus on what a Cloud is. On the other hand, Cloud Computing is not a completely new concept; it has intricate connection to the relatively new but thirteen-year established Grid Computing paradigm, and other relevant technologies such as utility computing, cluster computing, and distributed systems in general. This paper strives to compare and contrast Cloud Computing with Grid Computing from various angles and give insights into the essential characteristics of both.Comment: IEEE Grid Computing Environments (GCE08) 200

    Scheduling in virtual infrastructure

    For the execution of the scientific applications, different methods have been proposed to dynamically provide execution environments for such applications that hide the complexity of underlying distributed and heterogeneous infrastructures. Recently virtualization has emerged as a promising technology to provide such environments. Virtualization is a technology that abstracts away the details of physical hardware and provides virtualized resources for high-level scientific applications. Virtualization offers a cost-effective and flexible way to use and manage computing resources. Such an abstraction is appealing in Grid computing and Cloud computing for better matching jobs (applications) to computational resources. This work applies the virtualization concept to the Condor dynamic resource management system by using Condor Virtual Universe to harvest the existing virtual computing resources to their maximum utility. It allows existing computing resources to be dynamically provisioned at run-time by users based on application requirements instead of statically at design-time thereby lay the basis for efficient use of the available resources, thus providing way for the efficient use of the available resources.En la ejecución de aplicaciones científicas, existen diversas propuestas cuyo objetivo es proporcionar entornos adecuados de ejecución que oculten la complejidad de las infraestructuras distribuidas y heterogéneas subyacentes a las aplicaciones. Recientemente, la virtualización ha emergido como una prometedora tecnología que permite abstraer los detalles del hardware, mediante la asignación de recursos virtualizados a las aplicaciones científicas de altas necesidades de cómputo. La virtualización ofrece una solución rentable y además permite una gestión flexible de recursos. Este nivel de abstracción es deseable en entornos de Grid Computing y Cloud Computing para obtener una planificación adecuada de tarea (aplicaciones) sobre los recursos computacionales. Este trabajo aplica el concepto de virtualización al sistema gestor dinámico de recursos Condor, mediante la utilización de Condor Virtual Universe para conseguir una máxima utilización de los recursos computacionales virtuales. Además, permite que los recursos de cómputo existentes sean proporcionados dinámicamente en tiempo de ejecución por los usuarios, en función de los requisitos de la aplicación, en lugar de mantener la definición estática definida en tiempo de diseño, y así sentar las bases del uso eficiente de los recursos disponibles.En l'execució d'aplicacions científiques, existeixen diverses propostes amb l'objectiu de proporcionar entorns adequats d'execució que amaguin la complexitat de les infraestructures distribuïdes i heterogènies subjacents a les aplicacions. Recentment, la virtualització ha sorgit com una prometedora tecnologia que ha de permetre abstraure els detalls del hardware, mitjançant l'assignació de recursos virtualitzats a les aplicacions científiques amb altes necessitats de còmput. La virtualizatzació ofereix una solució rentable i a més permet una gestió flexible de recursos. Aquest nivell d'abstracció es desitjable en entorns de Grid Computing i Cloud Computing per a obtenir una planificació adequada del treball (aplicacions) sobre els recursos computacionals. Aquest treball aplica el concepte de virtualització al sistema gestor dinàmic de recursos Condor, mitjançant la utilització de Condor Virtual Universe per aconseguir una màxima utilització dels recursos computacionals virtuals. A més, permet que els recursos de còmput existents siguin proporcionats dinàmicament en temps d'execució pels usuaris, en funció dels requisits de l'aplicació, en lloc de mantenir la definició estàtica definida en temps de disseny, i així assentar unes bases per l'ús eficient dels recursos disponibles

    Cloud Computing Service Selection Algorithm

    In modern world Cloud Computing is one of the most promising and evolving areas of computer science. As time passes by more and more cloud devices are being setup. Similarly more companies and industries are opting for cloud services, etc. Cloud has made up a virtual reality of the practical world. It oers online storage space, online infrastructure, online platforms, etc to make our everyday computing experience easier and cheaper. One of the aspects of cloud computing is provision of servers to execute our programs which comes under Infrastructure as a Service (IaaS). In this project we have focused on devising an algorithm to schedule jobs and allocate servers in cloud systems. The algorithm is ecient as it provides optimal allocation. It maximizes the number of job requests that can be processed in unit time while conserving energy and keeping the costs low. The said optimal allocation is achieved by reducing the idle time of nodes of active servers and reducing the total number of servers used. We implemented our algorithm using random data sets of job requests with dierent attributes and generated simulations in forms of graphs. The graphs prove the eciency of job scheduling algorithm and the server allocation for which we used Best Fit algorithm of the Bin Packing problem. Finally a detailed analysis is given and future works are stated

    Virtualisation of Grid Resources and Prospects of the Measurement of Z Boson Production in Association with Jets at the LHC

    At the Large Hadron Collider, a large number of events containing Z bosons will be available enabling the calibration of the absolute jet energy scale for the first time. In this thesis, such a calibration is deduced within the CMS experiment including the investigation of effects from the underlying event and the jet size parameter. In addition, virtualisation of operating systems is applied to increase the load, stability and maintainability of local grid computing infrastructures

    Foundations and Technological Landscape of Cloud Computing

    The cloud computing paradigm has brought the benefits of utility computing to a global scale. It has gained paramount attention in recent years. Companies are seriously considering to adopt this new paradigm and expecting to receive significant benefits. In fact, the concept of cloud computing is not a revolution in terms of technology; it has been established based on the solid ground of virtualization, distributed system, and web services. To comprehend cloud computing, its foundations and technological landscape need to be adequately understood. This paper provides a comprehensive review on the building blocks of cloud computing and relevant technological aspects. It focuses on four key areas including architecture, virtualization, data management, and security issues