7 research outputs found

    A service-oriented architecture for scientific computing on cloud infrastructures

    Full text link
    This paper describes a service-oriented architecture that eases the process of scientific application deployment and execution in IaaS Clouds, with a focus on High Throughput Computing applications. The system integrates i) a catalogue and repository of Virtual Machine Images, ii) an application deployment and configuration tool, iii) a meta-scheduler for job execution management and monitoring. The developed system significantly reduces the time required to port a scientific application to these computational environments. This is exemplified by a case study with a computationally intensive protein design application on both a private Cloud and a hybrid three-level infrastructure (Grid, private and public Cloud).The authors wish to thank the financial support received from the Generalitat Valenciana for the project GV/2012/076 and to the Ministerio de Econom´ıa y Competitividad for the project CodeCloud (TIN2010-17804)Moltó, G.; Calatrava Arroyo, A.; Hernández García, V. (2013). A service-oriented architecture for scientific computing on cloud infrastructures. En High Performance Computing for Computational Science - VECPAR 2012. Springer Verlag (Germany). 163-176. doi:10.1007/978-3-642-38718-0_18S163176Vaquero, L.M., Rodero-Merino, L., Caceres, J., Lindner, M.: A break in the clouds. ACM SIGCOMM Computer Communication Review 39(1), 50 (2008)Armbrust, M., Fox, A., Griffith, R., Joseph, A.: Above the clouds: A berkeley view of cloud computing. Technical report, UC Berkeley Reliable Adaptive Distributed Systems Laboratory (2009)Rehr, J., Vila, F., Gardner, J., Svec, L., Prange, M.: Scientific computing in the cloud. Computing in Science 99 (2010)Keahey, K., Figueiredo, R., Fortes, J., Freeman, T., Tsugawa, M.: Science Clouds: Early Experiences in Cloud Computing for Scientific Applications. In: Cloud Computing and its Applications (2008)Carrión, J.V., Moltó, G., De Alfonso, C., Caballer, M., Hernández, V.: A Generic Catalog and Repository Service for Virtual Machine Images. In: 2nd International ICST Conference on Cloud Computing (CloudComp 2010) (2010)Moltó, G., Hernández, V., Alonso, J.: A service-oriented WSRF-based architecture for metascheduling on computational Grids. Future Generation Computer Systems 24(4), 317–328 (2008)Krishnan, S., Clementi, L., Ren, J., Papadopoulos, P., Li, W.: Design and Evaluation of Opal2: A Toolkit for Scientific Software as a Service. In: 2009 IEEE Congress on Services (2009)Distributed Management Task Force (DMTF): The Open Virtualization Format Specification (Technical report)Raman, R., Livny, M., Solomon, M.: Matchmaking: Distributed Resource Management for High Throughput Computing. In: Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, pp. 28–31 (1998)Wei, J., Zhang, X., Ammons, G., Bala, V., Ning, P.: Managing security of virtual machine images in a cloud environment. ACM Press, New York (2009)Keahey, K., Freeman, T.: Contextualization: Providing One-Click Virtual Clusters. In: Fourth IEEE International Conference on eScience, pp. 301–308 (2008)Foster, I.: Globus toolkit version 4: Software for service-oriented systems. Journal of Computer Science and Technology 21(4), 513–520 (2006)Moltó, G., Suárez, M., Tortosa, P., Alonso, J.M., Hernández, V., Jaramillo, A.: Protein design based on parallel dimensional reduction. Journal of Chemical Information and Modeling 49(5), 1261–1271 (2009)Calatrava, A.: In: Use of Grid and Cloud Hybrid Infrastructures for Scientific Computing (M.Sc. Thesis in Spanish), Universitat Politècnica de València (2012)Keahey, K., Freeman, T., Lauret, J., Olson, D.: Virtual workspaces for scientific applications. Journal of Physics: Conference Series 78(1), 012038 (2007)Pallickara, S., Pierce, M., Dong, Q., Kong, C.: Enabling Large Scale Scientific Computations for Expressed Sequence Tag Sequencing over Grid and Cloud Computing Clusters. In: Eigth International Conference on Parallel Processing and Applied Mathematics (PPAM 2009), Citeseer (2009)Merzky, A., Stamou, K., Jha, S.: Application Level Interoperability between Clouds and Grids. In: 2009 Workshops at the Grid and Pervasive Computing Conference, pp. 143–150 (2009)Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the Condor experience. Concurrency and Computation: Practice and Experience 17(2-4), 323–356 (2005)Simmhan, Y., van Ingen, C., Subramanian, G., Li, J.: Bridging the Gap between Desktop and the Cloud for eScience Applications. In: 2010 IEEE 3rd International Conference on Cloud Computing, pp. 474–481. IEEE (2010)Chappell, D.: Introducing windows azure. Technical report (2009

    Fault Tolerance and Scaling in e-Science Cloud Applications: Observations from the Continuing Development of MODISAzure

    Full text link
    It can be natural to believe that many of the traditional issues of scale have been eliminated or at least greatly reduced via cloud computing. That is, if one can create a seemingly wellfunctioning cloud application that operates correctly on small or moderate-sized problems, then the very nature of cloud programming abstractions means that the same application will run as well on potentially significantly larger problems. In this paper, we present our experiences taking MODISAzure, our satellite data processing system built on the Windows Azure cloud computing platform, from the proof-of-concept stage to a point of being able to run on significantly larger problem sizes (e.g., from national-scale data sizes to global-scale data sizes). To our knowledge, this is the longest-running eScience application on the nascent Windows Azure platform. We found that while many infrastructure-level issues were thankfully masked from us by the cloud infrastructure, it was valuable to design additional redundancy and fault-tolerance capabilities such as transparent idempotent task retry and logging to support debugging of user code encountering unanticipated data issues. Further, we found that using a commercial cloud means anticipating inconsistent performance and black-box behavior of virtualized compute instances, as well as leveraging changing platform capabilities over time. We believe that the experiences presented in this paper can help future eScience cloud application developers on Windows Azure and other commercial cloud providers

    Evaluating Streaming Strategies for Event Processing across Infrastructure Clouds

    Get PDF
    Abstract-Infrastructure clouds revolutionized the way in which we approach resource procurement by providing an easy way to lease compute and storage resources on short notice, for a short amount of time, and on a pay-as-you-go basis. This new opportunity, however, introduces new performance trade-offs. Making the right choices in leveraging different types of storage available in the cloud is particularly important for applications that depend on managing large amounts of data within and across clouds. An increasing number of such applications conform to a pattern in which data processing relies on streaming the data to a compute platform where a set of similar operations is repeatedly applied to independent chunks of data. This pattern is evident in virtual observatories such as the Ocean Observatory Initiative, in cases when new data is evaluated against existing features in geospatial computations or when experimental data is processed as a series of time events. In this paper, we propose two strategies for efficiently implementing such streaming in the cloud and evaluate them in the context of an ATLAS application processing experimental data. Our results show that choosing the right cloud configuration can improve overall application performance by as much as three times

    Towards Multi-site Metadata Management for Geographically Distributed Cloud Workflows

    Get PDF
    International audienceWith their globally distributed datacenters, clouds now provide an opportunity to run complex large-scale applications on dynamically provisioned, networked and federated infrastructures. However, there is a lack of tools supporting data-intensive applications across geographically distributed sites. For instance, scientific workflows which handle many small files can easily saturate state-of-the-art distributed filesystems based on centralized metadata servers (e.g. HDFS, PVFS). In this paper, we explore several alternative design strategies to efficiently support the execution of existing workflow engines across multi-site clouds, by reducing the cost of metadata operations. These strategies leverage workflow semantics in a 2-level metadata partitioning hierarchy that combines distribution and replication. The system was validated on the Microsoft Azure cloud across 4 EU and US datacenters. The experiments were conducted on 128 nodes using synthetic benchmarks and real-life applications. We observe as much as 28% gain in execution time for a parallel, geo-distributed real-world application (Montage) and up to 50% for a metadata-intensive synthetic benchmark, compared to a baseline centralized configuration

    Afinidade de tipos de aplicações em nuvens computacionais

    Get PDF
    Orientador : Prof. Dr. Bruno SchulzeCo-orientador : Prof. Dr. Luís Carlos Erpen de BonaTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 2014Inclui referências : f. 140-153Resumo: O aumento do uso de ambientes virtualizados tem levado à inúmeras pesquisas sobre as possibilidades e restrições de seu uso na computação em nuvem ou para consolidação de recursos. Entretanto, a maioria destes estudos são limitados a um nível de análise de desempenho, que não aprofunda os efeitos da concorrência entre os vários ambientes virtuais, e como mitigar esses efeitos. O estudo apresentado a seguir propõe o conceito de afinidade, que define o grau de coexistência entre as classes de aplicações. Estas combinações têm como elementos que as influenciam: as classes e subclasses de algoritmos utilizados na implementação destas aplicações, associadas aos tipos de bibliotecas paralelas por elas utilizadas. Os resultados obtidos nesta pesquisa demonstram que os efeitos destas combinações entre classes de algoritmos e bibliotecas de paralelização têm valores tão diversos, que torna necessário o estudo e mensuração destes valores detalhadamente, justificando a proposta aqui apresentada quanto a definição e análise do conceito afinidade, buscando com isso contribuir para um melhor uso dos recursos, sobretudo no que tange à computação massivamente paralela e distribuída, com impactos tanto na elaboração de novas aplicações, quanto na elaboração de novos escalonadores para estes ambientes. Palavras chave: afinidade, classes de aplicações, computação distribuída, dwarfs, concorrência, gerência, nuvem, virtualização.Abstract: The increased use of virtual environments has motivated extensive research on the possibilities and limitations of its use in cloud computing or resource consolidation. However the majority of these studies are limited to a degree of performance analysis, which does not deepens the effects of concurrency between multiple virtual environments, and how to mitigate these effects. The study presented below proposes the concept of affinity, that defines the degree of coexistence between classes of applications. These combinations have as elements that influence them: classes and subclasses of algorithms used in the implementation of these applications, associated with the types of parallel libraries used by them. The results obtained in this research show that the effects of these combinations between classes of algorithms and parallelization libraries have different values, which makes the study and measuring of these values in detail, justifying the proposal presented here as well as the definition and analysis of the affinity concept, looking thereby contribute to a better use of resources, especially regarding to the massively parallel and distributed computing, with impacts both in developing of new applications, as the development of new schedulers for these environments. Keywords: affinity, applications classes, distributed computing, dwarfs, concurrency, management, cloud computing, virtualization

    Explorando a elasticidade em nível de programação no desenvolvimento e na execução de aplicações científicas

    Get PDF
    Orientador : Prof. Dr. Luis Carlos Erpen de BonaTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 28/04/2014Inclui referênciasResumo: A elasticidade pode ser definida como a capacidade de um sistema de modificar dinamicamente os recursos computacionais utilizados por uma aplicação. Embora diversos mecanismos de elasticidade tenham sido propostos, ainda apresentam uma série de limitações ao fornecer suporte à elasticidade para aplicações científicas. Neste trabalho, propõe-se uma abordagem para o desenvolvimento de aplicações científicas elásticas, na qual o controle da elasticidade é feito em nível de programação. Isso significa que o controle de elasticidade é incorporado ao código-fonte, permitindo que as ações de alocação e desalocação de recursos possam ser realizadas pela própria aplicação e não dependa de mecanismos externos ou interação com o usuário. A construção de aplicações elásticas de acordo com abordagem proposta baseia-se no conceito de primitivas de elasticidade, um conjunto de funções que permitem que as aplicações comuniquem-se com a nuvem para solicitar ou liberar recursos, bem como para coletar informações do ambiente virtual e da nuvem. Assim, é possível desenvolver controladores de elasticidade sob medida que permitem que aplicações ajustem seus próprios recursos de acordo com suas demandas ou de modo a satisfazer algum critério específico, por exemplo, custo ou desempenho. A abordagem também permite que bibliotecas e frameworks de programação paralela possam ser construídas ou adaptadas de modo a oferecer elasticidade de modo transparente. Para permitir a construção de aplicações elásticas utilizando a abordagem proposta, desenvolveu-se o framework Cloudine. O Cloudine fornece as primitivas de elasticidade e um ambiente de execução, o qual oferece o suporte para a execução das aplicações elásticas na nuvem. A exploração da elasticidade em nível de programação é validada por um conjunto de experimentos realizados utilizando o Cloudine. O framework é utilizado com sucesso para fornecer elasticidade a um conjunto de aplicações, dentre as quais destacam-se uma aplicação de montagem de genomas (SAND) e um modelo climático (OLAM). O Cloudine também é usado para estender a biblioteca OpenMP do GCC (libgomp) para oferecer elasticidade de modo automático e transparente.Abstract: Elasticity is defined as the ability to adaptively scale resources up and down in order to meet varying application demands. Although several mechanisms to provide this feature are offered by public cloud providers and in some academic works, we argue that these solutions present limitations in providing elasticity for scientific applications, since they are not developed to this purpose and cannot consider the particularities of this class of applications. In this thesis we propose an approach for exploring the elasticity in scientific applications, in which the elasticity control is embedded within application code and the elasticity actions (allocation and deallocation of resources) are performed by the application itself, based in its runtime requirements or internal events. The development of embedded elasticity controllers is based on the concept of elasticity primitives, which are basic functions that allow to perform requests for allocation or deallocation of resources directly to the cloud. Thus, it is possible to develop tailor made elasticity controllers that enable applications to adjust its own resources according to its demands or to satisfy some specific criteria, such as cost or performance. It is also possible to develop elasticity-aware parallel processing middleware that transparently support applications elasticity. To enable the construction of elastic applications using the presented approach, we developed the Cloudine framework. Cloudine provides the primitive set and a runtime environment, which supports the execution of elastic application in the cloud. The proposed approach is validated by a set of experiments using Cloudine. The framework was successfully used to provide elasticity to a number of applications, among which we highlight a genome assembler (SAND) and a climate model (OLAM). The Cloudine is also used to extend the GCC’s OpenMP library (libgomp) to provide automatic allocation of resources

    An assessment model for Enterprise Clouds adoption

    Get PDF
    Context: Enterprise Cloud Computing (or Enterprise Clouds) is using the Cloud Computing services by a large-scale organisation to migrate its existing IT services or use new Cloud based services. There are many issues and challenges that are barrier to the adoption of Enterprise Clouds. The adoption challenges have to be addressed for better assimilation of Cloud based services within the organisation. Objective: The aim of this research was to develop an assessment model for adoption of Enterprise Clouds. Method: Key challenges reported as barrier in adoption of Cloud Computing were identified from literature using the Systematic Literature Review methodology. A survey research was carried out to elicit industrial approaches and practices from Cloud Computing experts that help in overcoming the key challenges. Both key challenges and practices were used in formulating the assessment model. Results: The results have highlighted that key challenges in the adoption of Enterprise Clouds are security & reliability concerns, resistance to change, vendor lock-in issues, data privacy and difficulties in application and service migration. The industrial practices to overcome these challenges are: planning and executing pilot project, assessment of IT needs, use of open source APIs, involvement of legal team in vendor selection, identification of the processes to change, involvement of senior executive as change champion, using vendor partners to support application/service migration to Cloud Computing and creating employee awareness about Cloud Computing services. Conclusion: Using the key challenges and practices, the assessment model was developed that assesses an organisation’s readiness to adopt Enterprise Clouds. The model measures the readiness in four dimensions: technical, legal & compliance, IT capabilities and end user readiness for the adoption of Enterprise Clouds. The model’s result can help the organisation in overcoming the adoption challenges for successful assimilation of newly deployed or migrated IT services on Enterprise Clouds
    corecore