280 research outputs found

    An Experiment on Bare-Metal BigData Provisioning

    Full text link
    Many BigData customers use on-demand platforms in the cloud, where they can get a dedicated virtual cluster in a couple of minutes and pay only for the time they use. Increasingly, there is a demand for bare-metal bigdata solutions for applications that cannot tolerate the unpredictability and performance degradation of virtualized systems. Existing bare-metal solutions can introduce delays of 10s of minutes to provision a cluster by installing operating systems and applications on the local disks of servers. This has motivated recent research developing sophisticated mechanisms to optimize this installation. These approaches assume that using network mounted boot disks incur unacceptable run-time overhead. Our analysis suggest that while this assumption is true for application data, it is incorrect for operating systems and applications, and network mounting the boot disk and applications result in negligible run-time impact while leading to faster provisioning time.This research was supported in part by the MassTech Collaborative Research Matching Grant Program, NSF awards 1347525 and 1414119 and several commercial partners of the Massachusetts Open Cloud who may be found at http://www.massopencloud.or

    HIL: designing an exokernel for the data center

    Full text link
    We propose a new Exokernel-like layer to allow mutually untrusting physically deployed services to efficiently share the resources of a data center. We believe that such a layer offers not only efficiency gains, but may also enable new economic models, new applications, and new security-sensitive uses. A prototype (currently in active use) demonstrates that the proposed layer is viable, and can support a variety of existing provisioning tools and use cases.Partial support for this work was provided by the MassTech Collaborative Research Matching Grant Program, National Science Foundation awards 1347525 and 1149232 as well as the several commercial partners of the Massachusetts Open Cloud who may be found at http://www.massopencloud.or

    Assessment, Design and Implementation of a Private Cloud for MapReduce Applications

    Get PDF
    [Abstract] Scientific computation and data intensive analyses are ever more frequent. On the one hand, the MapReduce programming model has gained a lot of attention for its applicability in large parallel data analyses and Big Data applications. On the other hand, Cloud computing seems to be increasingly attractive in solving these computing problems that demand a lot of resources. This paper explores the potential symbiosis between MapReduce and Cloud Computing, in order to create a robust and scalable environment to execute MapReduce workflows regardless of the underlaying infrastructure. The main goal of this work is to provide an easy-to-install interface, so as non-expert scientists can deploy a suitable testbed for their MapReduce experiments on local resources of their institution. Testing cases were performed in order to evaluate the required time for the whole executing process on a real cluster

    Assessment, Design and Implementation of a Private Cloud for MapReduce Applications

    Get PDF
    Scientific computation and data intensive analyses are ever more frequent. On the one hand, the MapReduce programming model has gained a lot of attention for its applicability in large parallel data analyses and Big Data applications. On the other hand, Cloud computing seems to be increasingly attractive in solving these computing problems that demand a lot of resources. This paper explores the potential symbiosis between MapReduce and Cloud Computing, in order to create a robust and scalable environment to execute MapReduce workflows regardless of the underlaying infrastructure. The main goal of this work is to provide an easy-to-install interface, so as non-expert scientists can deploy a suitable testbed for their MapReduce experiments on local resources of their institution. Testing cases were performed in order to evaluate the required time for the whole executing process on a real clusterS

    Resilin: Elastic MapReduce over Multiple Clouds

    Get PDF
    The MapReduce programming model, introduced by Google, offers a simple and efficient way of performing distributed computation over large data sets. Although Google's implementation is proprietary, MapReduce can be leveraged by anyone using the free and open-source Apache Hadoop framework. To simplify the usage of Hadoop in the cloud, Amazon Web Services offers Elastic MapReduce, a web service enabling users to run MapReduce jobs. Elastic MapReduce takes care of resource provisioning, Hadoop configuration and performance tuning, data staging, fault tolerance, etc. This service drastically reduces the entry barrier to perform MapReduce computations in the cloud, allowing users to concentrate on the problem to solve. However, Elastic MapReduce is restricted to Amazon EC2 resources, and is provided at an additional cost. In this paper, we present Resilin, a system implementing the Elastic MapReduce API with resources from clouds other than Amazon EC2, such as private and scientific clouds. Furthermore, we explore a feature going beyond the current Amazon Elastic MapReduce offering: performing MapReduce computations over multiple distributed clouds. The evaluation of Resilin shows the benefits of running computations on more than one cloud. While not being the most efficient way to perform Hadoop computations, it solves the problem of resource availability and adds more flexibility regarding the type/price of resource.Le modèle de programmation MapReduce, introduit par Google, offre un moyen simple et efficace de réaliser des calculs distribués sur de grandes quantités de données. Bien que la mise en oeuvre de Google soit propriétaire, MapReduce peut être utilisé librement avec l'environnement Hadoop. Pour simplifier l'utilisation de Hadoop dans les nuages informatiques, Amazon Web Services offre Elastic MapReduce, un service web qui permet aux utilisateurs d'exécuter des applications MapReduce. Il prend en charge l'allocation de ressources, la configuration et l'optimisation de Hadoop, la copie des données, la tolérance aux fautes, etc. Ce service facilite l'exécution d'applications MapReduce dans les nuages informatiques, permettant ainsi aux utilisateurs de se concentrer sur la résolution de leur problème plutôt que sur la gestion de la plate-forme d'exécution. Elastic MapReduce est limité á l'utilisation de ressources fournies par Amazon EC2 et est proposé à un coût additionnel. Dans cet article, nous présentons Resilin, un système mettant en oeuvre l'API Elastic MapReduce avec des ressources provenant d'autres nuages informatiques que Amazon EC2, tels que les nuages privés ou communautaires. De plus, nous explorons une fonctionnalité nouvelle par rapport au service offert par Amazon Elastic MapReduce: l'exécution d'applications MapReduce sur plusieurs nuages géographiquement distribués. L'évaluation de Resilin montre les avantages liés à l'utilisation de plus d'un nuage pour l'exécution d'applications MapReduce. Bien qu'il ne fournisse pas la solution la plus efficace pour l'exécution d'applications MapReduce, Resilin résout le problème de la disponibilité des ressources et ajoute une plus grande flexibilité en ce qui concerne le type et le prix des ressources

    M2: Malleable Metal as a Service

    Full text link
    Existing bare-metal cloud services that provide users with physical nodes have a number of serious disadvantage over their virtual alternatives, including slow provisioning times, difficulty for users to release nodes and then reuse them to handle changes in demand, and poor tolerance to failures. We introduce M2, a bare-metal cloud service that uses network-mounted boot drives to overcome these disadvantages. We describe the architecture and implementation of M2 and compare its agility, scalability, and performance to existing systems. We show that M2 can reduce provisioning time by over 50% while offering richer functionality, and comparable run-time performance with respect to tools that provision images into local disks. M2 is open source and available at https://github.com/CCI-MOC/ims.Comment: IEEE International Conference on Cloud Engineering 201

    A platform to deploy customized scientific virtual infrastructures on the cloud

    Full text link
    This paper presents a software platform to dynamically deploy complex scientific virtual computing infrastructures, on top of Infrastructure as a Service (IaaS) Clouds. The platform orchestrates different services to provision the virtual computing resources. It dynamically installs the appropriate software to satisfy the requirements of a researcher, both on public and on-premise Clouds. The platform provides a web interface to enable the users to easily management of the lifecycle of virtual infrastructures. It enables users to define infrastructures, share them with other users, deploy and relinquish them, add or remove resources dynamically, create and share application recipes, etc. The paper also describes three case studies to deploy complex infrastructures, namely a Hadoop cluster, a single-node to perform NGS sequencing and a gateway for users to access the European Grid Infrastructure (EGI). This platform promotes a better use of on-premise hardware resources of a research center by allocating the computing resources just-in-time to the specific life time of the virtual infrastructures as well as the deployment of the very same infrastructures on a public Cloud.The authors would to thank the Spanish "Ministerio de Economia y Competitividad" for the project "Clusters Virtuales Elasticos y Migrables sobre Infraestructuras Cloud Hibridas" with reference TIN2013-44390-R.Caballer Fernández, M.; Segrelles Quilis, JD.; Moltó, G.; Blanquer Espert, I. (2015). A platform to deploy customized scientific virtual infrastructures on the cloud. Concurrency and Computation: Practice and Experience. 27(16):4318-4329. https://doi.org/10.1002/cpe.3518S431843292716Mell P Grance T The NIST definition of Cloud computing. NIST Special Publication 800-145 (Final) Technical Report 2011 http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdfBuyya, R., Broberg, J., & Goscinski, A. (Eds.). (2011). Cloud Computing. doi:10.1002/9780470940105Sahoo J Mohapatra S Lath R Virtualization: a survey on concepts, taxonomy and associated security issues 2010 Second International Conference on Computer and Network Technology Bangkok, Thailand 2010 222 226OpenStack OpenStack 2013 http://openstack.orgNurmi D Wolski R Grzegorczyk C Obertelli G Soman S Youseff L Zagorodnov D The Eucalyptus open-source Cloud-computing system Proceedings of 9th IEEE International Symposium on Cluster Computing and the Grid Shanghai, China 2009 124 131Amazon Web Services AWS CloudFormation http://aws.amazon.com/cloudformation/Amazon Web Services AWS OpsWorks http://aws.amazon.com/opsworks/Keahey K Freeman T Contextualization: providing one-click virtual clusters Fourth IEEE International Conference on eScience Indianapolis, Indiana, USA 2008 301 308Keahey K Freeman T Architecting a large-scale elastic environment: recontextualization and adaptive Cloud services for scientific computing 2012Marshall P Keahey K Freeman T Elastic site: using Clouds to elastically extend site resources Proceedings of the 2010 IEEE/ACM 10th International Conference on Cluster, Cloud and Grid Computing CCGRID '10 IEEE Computer Society, Washington, DC, USA 2010 43 52Bresnahan J Freeman T LaBissoniere D Keahey K Managing appliance launches in infrastructure Clouds Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery TG '11 ACM, New York, NY, USA 2011 12:1 12:7Apache Whirr 2013 from:http://whirr.apache.org/Juve G Deelman E Automating application deployment in infrastructure clouds Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science CLOUDCOM '11 IEEE Computer Society, Washington, DC, USA 2011 658 665OASIS Topology and orchestration specification for cloud applications version 1.0 2013 http://docs.oasis-open.org/tosca/TOSCA/v1.0/TOSCA-v1.0.htmlBinz T Breitenbcher U Haupt F Kopp O Leymann F Nowak A Wagner S OpenTOSCA - a runtime for TOSCA-based cloud applications ICSOC, Lecture Notes in Computer Science 8274 Springer 2013 692 695Puppet Labs IT automation software for system administrators 2013 http://www.puppetlabs.com/Opscode Chef 2013 http://www.opscode.com/chef/DeHaan M Ansible 2013 http://ansible.cc/Vogels, W. (2008). Beyond server consolidation. Queue, 6(1), 20. doi:10.1145/1348583.1348590Carrión JV Moltó G De Alfonso C Caballer M Hernández V A generic catalog and repository service for virtual machine images 2nd International ICST Conference on Cloud Computing (CloudComp 2010) Barcelona, Spain 2010 1 15de Alfonso C Caballer M Alvarruiz F Molto G Hernández V Infrastructure deployment over the Cloud 2011 IEEE Third International Conference on Cloud Computing Technology and Science Athens, Greece 2011 517 521Caballer, M., Blanquer, I., Moltó, G., & de Alfonso, C. (2014). Dynamic Management of Virtual Infrastructures. Journal of Grid Computing, 13(1), 53-70. doi:10.1007/s10723-014-9296-5Dean, J., & Ghemawat, S. (2008). MapReduce. Communications of the ACM, 51(1), 107. doi:10.1145/1327452.1327492Shvachko K Kuang H Radia S Chansler R The Hadoop distributed file system 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) Incline Village, NV, USA 2010 1 10Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403-410. doi:10.1016/s0022-2836(05)80360-

    Data Analytics as a Service: A look inside the PANACEA project

    Get PDF

    MOLNs: A cloud platform for interactive, reproducible and scalable spatial stochastic computational experiments in systems biology using PyURDME

    Full text link
    Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools, a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments
    corecore