7 research outputs found

    A service-oriented architecture for scientific computing on cloud infrastructures

    Full text link
    This paper describes a service-oriented architecture that eases the process of scientific application deployment and execution in IaaS Clouds, with a focus on High Throughput Computing applications. The system integrates i) a catalogue and repository of Virtual Machine Images, ii) an application deployment and configuration tool, iii) a meta-scheduler for job execution management and monitoring. The developed system significantly reduces the time required to port a scientific application to these computational environments. This is exemplified by a case study with a computationally intensive protein design application on both a private Cloud and a hybrid three-level infrastructure (Grid, private and public Cloud).The authors wish to thank the financial support received from the Generalitat Valenciana for the project GV/2012/076 and to the Ministerio de Econom´ıa y Competitividad for the project CodeCloud (TIN2010-17804)Moltó, G.; Calatrava Arroyo, A.; Hernández García, V. (2013). A service-oriented architecture for scientific computing on cloud infrastructures. En High Performance Computing for Computational Science - VECPAR 2012. Springer Verlag (Germany). 163-176. doi:10.1007/978-3-642-38718-0_18S163176Vaquero, L.M., Rodero-Merino, L., Caceres, J., Lindner, M.: A break in the clouds. ACM SIGCOMM Computer Communication Review 39(1), 50 (2008)Armbrust, M., Fox, A., Griffith, R., Joseph, A.: Above the clouds: A berkeley view of cloud computing. Technical report, UC Berkeley Reliable Adaptive Distributed Systems Laboratory (2009)Rehr, J., Vila, F., Gardner, J., Svec, L., Prange, M.: Scientific computing in the cloud. Computing in Science 99 (2010)Keahey, K., Figueiredo, R., Fortes, J., Freeman, T., Tsugawa, M.: Science Clouds: Early Experiences in Cloud Computing for Scientific Applications. In: Cloud Computing and its Applications (2008)Carrión, J.V., Moltó, G., De Alfonso, C., Caballer, M., Hernández, V.: A Generic Catalog and Repository Service for Virtual Machine Images. In: 2nd International ICST Conference on Cloud Computing (CloudComp 2010) (2010)Moltó, G., Hernández, V., Alonso, J.: A service-oriented WSRF-based architecture for metascheduling on computational Grids. Future Generation Computer Systems 24(4), 317–328 (2008)Krishnan, S., Clementi, L., Ren, J., Papadopoulos, P., Li, W.: Design and Evaluation of Opal2: A Toolkit for Scientific Software as a Service. In: 2009 IEEE Congress on Services (2009)Distributed Management Task Force (DMTF): The Open Virtualization Format Specification (Technical report)Raman, R., Livny, M., Solomon, M.: Matchmaking: Distributed Resource Management for High Throughput Computing. In: Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, pp. 28–31 (1998)Wei, J., Zhang, X., Ammons, G., Bala, V., Ning, P.: Managing security of virtual machine images in a cloud environment. ACM Press, New York (2009)Keahey, K., Freeman, T.: Contextualization: Providing One-Click Virtual Clusters. In: Fourth IEEE International Conference on eScience, pp. 301–308 (2008)Foster, I.: Globus toolkit version 4: Software for service-oriented systems. Journal of Computer Science and Technology 21(4), 513–520 (2006)Moltó, G., Suárez, M., Tortosa, P., Alonso, J.M., Hernández, V., Jaramillo, A.: Protein design based on parallel dimensional reduction. Journal of Chemical Information and Modeling 49(5), 1261–1271 (2009)Calatrava, A.: In: Use of Grid and Cloud Hybrid Infrastructures for Scientific Computing (M.Sc. Thesis in Spanish), Universitat Politècnica de València (2012)Keahey, K., Freeman, T., Lauret, J., Olson, D.: Virtual workspaces for scientific applications. Journal of Physics: Conference Series 78(1), 012038 (2007)Pallickara, S., Pierce, M., Dong, Q., Kong, C.: Enabling Large Scale Scientific Computations for Expressed Sequence Tag Sequencing over Grid and Cloud Computing Clusters. In: Eigth International Conference on Parallel Processing and Applied Mathematics (PPAM 2009), Citeseer (2009)Merzky, A., Stamou, K., Jha, S.: Application Level Interoperability between Clouds and Grids. In: 2009 Workshops at the Grid and Pervasive Computing Conference, pp. 143–150 (2009)Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the Condor experience. Concurrency and Computation: Practice and Experience 17(2-4), 323–356 (2005)Simmhan, Y., van Ingen, C., Subramanian, G., Li, J.: Bridging the Gap between Desktop and the Cloud for eScience Applications. In: 2010 IEEE 3rd International Conference on Cloud Computing, pp. 474–481. IEEE (2010)Chappell, D.: Introducing windows azure. Technical report (2009

    Découverte et allocation des ressources pour le traitement de requêtes dans les systèmes grilles

    Get PDF
    De nos jours, les systèmes Grille, grâce à leur importante capacité de calcul et de stockage ainsi que leur disponibilité, constituent l'un des plus intéressants environnements informatiques. Dans beaucoup de différents domaines, on constate l'utilisation fréquente des facilités que les environnements Grille procurent. Le traitement des requêtes distribuées est l'un de ces domaines où il existe de grandes activités de recherche en cours, pour transférer l'environnement sous-jacent des systèmes distribués et parallèles à l'environnement Grille. Dans le cadre de cette thèse, nous nous concentrons sur la découverte des ressources et des algorithmes d'allocation de ressources pour le traitement des requêtes dans les environnements Grille. Pour ce faire, nous proposons un algorithme de découverte des ressources pour le traitement des requêtes dans les systèmes Grille en introduisant le contrôle de topologie auto-stabilisant et l'algorithme de découverte des ressources dirigé par l'élection convergente. Ensuite, nous présentons un algorithme d'allocation des ressources, qui réalise l'allocation des ressources pour les requêtes d'opérateur de jointure simple par la génération d'un espace de recherche réduit pour les nœuds candidats et en tenant compte des proximités des candidats aux sources de données. Nous présentons également un autre algorithme d'allocation des ressources pour les requêtes d'opérateurs de jointure multiple. Enfin, on propose un algorithme d'allocation de ressources, qui apporte une tolérance aux pannes lors de l'exécution de la requête par l'utilisation de la réplication passive d'opérateurs à état. La contribution générale de cette thèse est double. Premièrement, nous proposons un nouvel algorithme de découverte de ressource en tenant compte des caractéristiques des environnements Grille. Nous nous adressons également aux problèmes d'extensibilité et de dynamicité en construisant une topologie efficace sur l'environnement Grille et en utilisant le concept d'auto-stabilisation, et par la suite nous adressons le problème de l'hétérogénéité en proposant l'algorithme de découverte de ressources dirigé par l'élection convergente. La deuxième contribution de cette thèse est la proposition d'un nouvel algorithme d'allocation des ressources en tenant compte des caractéristiques de l'environnement Grille. Nous abordons les problèmes causés par la grande échelle caractéristique en réduisant l'espace de recherche pour les ressources candidats. De ce fait nous réduisons les coûts de communication au cours de l'exécution de la requête en allouant des nœuds au plus près des sources de données. Et enfin nous traitons la dynamicité des nœuds, du point de vue de leur existence dans le système, en proposant un algorithme d'affectation des ressources avec une tolérance aux pannes.Grid systems are today's one of the most interesting computing environments because of their large computing and storage capabilities and their availability. Many different domains profit the facilities of grid environments. Distributed query processing is one of these domains in which there exists large amounts of ongoing research to port the underlying environment from distributed and parallel systems to the grid environment. In this thesis, we focus on resource discovery and resource allocation algorithms for query processing in grid environments. For this, we propose resource discovery algorithm for query processing in grid systems by introducing self-stabilizing topology control and converge-cast based resource discovery algorithms. Then, we propose a resource allocation algorithm, which realizes allocation of resources for single join operator queries by generating a reduced search space for the candidate nodes and by considering proximities of candidates to the data sources. We also propose another resource allocation algorithm for queries with multiple join operators. Lastly, we propose a fault-tolerant resource allocation algorithm, which provides fault-tolerance during the execution of the query by the use of passive replication of stateful operators. The general contribution of this thesis is twofold. First, we propose a new resource discovery algorithm by considering the characteristics of the grid environments. We address scalability and dynamicity problems by constructing an efficient topology over the grid environment using the self-stabilization concept; and we deal with the heterogeneity problem by proposing the converge-cast based resource discovery algorithm. The second main contribution of this thesis is the proposition of a new resource allocation algorithm considering the characteristics of the grid environment. We tackle the scalability problem by reducing the search space for candidate resources. We decrease the communication costs during the query execution by allocating nodes closer to the data sources. And finally we deal with the dynamicity of nodes, in terms of their existence in the system, by proposing the fault-tolerant resource allocation algorithm

    Méthode de découverte de sources de données tenant compte de la sémantique en environnement de grille de données

    Get PDF
    Les applications grilles de données de nos jours partagent un nombre gigantesque de sources de données en un environnement instable où une source de données peut à tout moment joindre ou quitter le système. Ces sources de données sont hétérogènes, autonomes et distribuées à grande échelle. Dans cet environnement, la découverte efficace des sources de données pertinentes pour l'exécution de requêtes est un défi. Les premiers travaux sur la découverte de sources de données se sont basés sur une recherche par mots clés. Ces solutions ne sont pas satisfaisantes puisqu'elles ne tiennent pas compte des problèmes de l'hétérogénéité sémantique des sources de données. Ainsi, d'autres solutions proposent un schéma global ou une ontologie globale. Cependant, la conception d'un tel schéma ou d'une telle ontologie est une tâche complexe à cause du nombre de sources de données. D'autres solutions optent pour l'usage de correspondances entre les schémas des sources de données ou en s'appuyant sur des ontologies de domaine et en établissant des relations de 'mapping' entre ces dernières. Toutes ces solutions imposent une topologie fixe soit pour les correspondances soit pour les relations de 'mapping'. Cependant, la définition de relations de 'mapping' entre ontologies de domaine est une tâche ardue et imposer une topologie fixe est un inconvénient majeur. Dans cette perspective, nous proposons dans cette thèse une méthode de découverte de sources de données prenant en compte les problèmes liés à l'hétérogénéité sémantique en environnement instable et à grande échelle. Pour cela, nous associons une Organisation Virtuelle (OV) et une ontologie de domaine à chaque domaine et nous nous basons sur les relations de 'mappings' existantes entre ces ontologies. Nous n'imposons aucune hypothèse sur la topologie des relations de 'mapping' mis à part que le graphe qu'elles forment soit connexe. Nous définissons un système d'adressage permettant un accès permanent de n'importe quelle OV vers une autre malgré la dynamicité des pairs. Nous présentons également une méthode de maintenance dite 'paresseuse' afin de limiter le nombre de messages nécessaires à la maintenance du système d'adressage lors de la connexion ou de la déconnexion de pairs. Pour étudier la faisabilité ainsi que la viabilité de nos propositions, nous effectuons une évaluation des performances.Nowadays, data grid applications look to share a huge number of data sources in an unstable environment where a data source may join or leave the system at any time. These data sources are highly heterogeneous because they are independently developed and managed and geographically scattered. In this environment, efficient discovery of relevant data sources for query execution is a complex problem due to the source heterogeneity, large scale environment and system instability. First works on data source discovery are based on a keyword search. These initial solutions are not sufficient because they do not take into account problem of semantic heterogeneity of data sources. Thus, the community has proposed other solutions to consider semantic aspects. A first solution consists in using a global schema or global ontology. However, the conception of such scheme or such ontology is a complex task due to the number of data sources. Other solutions have been proposed providing mappings between data source schemas or based on domain ontologies and establishing mapping relations between them. All these solutions impose a fixed topology for connections as well as mapping relationships. However, the definition of mapping relations between domain ontologies is a difficult task and imposing a fixed topology is a major inconvenience. In this perspective, we propose in this thesis a method for discovering data sources taking into account semantic heterogeneity problems in unstable and large scale environment. For that purpose, we associate a Virtual Organisation (VO) and a domain ontology to each domain and we rely on relationship mappings between existing ontologies. We do not impose any hypothesis on the relationship mapping topology, except that they form connected graph. We define an addressing system for permanent access from any OVi to another OVj despite peers' dynamicity (with i inégalité j). We also present a method of maintenance called 'lazy' to limit the number of messages required to maintain the addressing system during the connection or disconnection of peers. To study the feasibility as well as the viability of our proposals, we make a performance evaluation

    Optimisation of nonlinear photonic devices: design of optical fibre spectra and plasmonic systems

    Full text link
    El propósito de esta tesis es diseñar y optimizar dispositivos fotónicos en el régimen no lineal. En particular, se han elegido dos tipos de dispositivos, que se clasifican según los fenómenos físicos de interés. La primera clase corresponde a fibras convencionales o de cristal fotónico, diseñadas para que la dinámica temporal de los paquetes de onda que se propagan en su interior genere espectros con las características deseadas, en el contexto del supercontinuo. La segunda clase explota la fenomenología espacial asociada a las ondas electromagnéticas que se propagan sobre la superficie de un metal. Estas ondas permiten, desde diseñar dispositivos tipo chip fotónico cuyas dimensiones típicas están muy por debajo de la longitud de onda de la luz, hasta la generación de estados no lineales híbridos de dinámica singular. Todos estos efectos tienen lugar dentro del marco proporcionado por las ecuaciones de Maxwell macroscópicas, las cuales han sido resueltas numéricamente. En algunos casos se emplean grandes aproximaciones teóricas para estudiar sistemas 1D, mientras que en otros se integran directamente en 3D. En el caso en el que la optimización del dispositivo resulta no trivial tras haber adquirido un conocimiento teórico profundo del mismo, se emplea una novedosa herramienta numérica que nace de la combinación de algoritmos genéticos con plataforma Grid.Milián Enrique, C. (2012). Optimisation of nonlinear photonic devices: design of optical fibre spectra and plasmonic systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/14670Palanci

    APLICACIÓN DE LAS TECNOLOGÍAS GRID Y DE LAS ARQUITECTURAS ORIENTADAS A SERVICIOS EN EL ANÁLISIS DE ESTRUCTURAS DE EDIFICACIÓN

    Full text link
    Esta tesis muestra la implementación de una plataforma orientada a servicios que lleva a cabo un análisis estático y dinámico de estructuras de gran dimensión bajo demanda empleando una infraestructura Grid. El Servicio Grid ha sido desarrollado para ofrecer un servicio on-line multiusuario a la comunidad analistas estructurales.López Herrero, R. (2007). APLICACIÓN DE LAS TECNOLOGÍAS GRID Y DE LAS ARQUITECTURAS ORIENTADAS A SERVICIOS EN EL ANÁLISIS DE ESTRUCTURAS DE EDIFICACIÓN. http://hdl.handle.net/10251/12536Archivo delegad

    Uso de infraestructuras híbridas Grid y Cloud para la computación científica

    Full text link
    [ES] El auge de las técnicas de virtualización en los últimos años ha propiciado la aparición del Cloud Computing. Esta nueva tecnología ha abierto un camino hacia el empleo de infraestructuras computacionales híbridas en el ámbito científico, basadas en potentes recursos Grid combinados con las infraestructuras virtuales dinámicas y elásticas que proporciona el Cloud. Pero esta combinación de recursos para dar soporte a ejecuciones de aplicaciones científicas intensivas no es trivial, propiciando la aparición de nuevos retos y oportunidades en áreas como la provisión de recursos o la metaplanificación. En esta tesis de máster, en primer lugar, se han desarrollado modelos teóricos de metaplanificación híbrida Grid/Cloud que permiten la integración y aprovechamiento de ambas infraestructuras por aplicaciones científicas HTC (High Throughput Computing) de acuerdo al estado del arte actual. Estos modelos teóricos se han puesto en práctica a través del desarrollo de herramientas que permiten el despliegue y ejecución concurrente de aplicaciones científicas sobre plataformas Grid y Cloud (incluyendo Clouds privados y públicos). En segundo lugar, se ha realizado un estudio de la sobrecarga que supone el proceso de virtualización con respecto a una máquina física. Finalmente, para poder valorar y poner en práctica la efectividad de los modelos, se ha incluido un caso de estudio para una aplicación científica computacionalmente compleja capaz de realizar el proceso de diseño de proteínas de propósito específico.[EN] The advent of virtualization techniques in recent years has led to the emergence of Cloud Computing. This new technology has paved the way towards the use of hybrid computing infrastructures in science, based on powerful Grid resources combined with dynamic and elastic virtual infrastructures that provides the Cloud. But this combination of resources to support the execution of computationally intensive scientific applications is not trivial, giving rise to new challenges and opportunities in areas such as the provision of resources or meta-scheduling. This master's thesis, has first developed theoretical models of hybrid Grid/Cloud metascheduling that enable the integration and use of both infrastructures by scientific HTC (High Throughput Computing) applications according to the current state of art. These theoretical models have been implemented through the development of prototype implementations that allow the deployment and concurrent execution of scientific applications on Grid and Cloud platforms (including private and public Clouds). Secondly, we have made a study of the overheads of the virtualization process with respect to a physical machine. Finally, it assesses the effectiveness of the models. For that, we have included a case study that involves a computationally intensive scientific application that is able to perform the optimization of proteins with target properties.Calatrava Arroyo, A. (2012). Uso de infraestructuras híbridas Grid y Cloud para la computación científica. http://hdl.handle.net/10251/27150Archivo delegad

    DRIVE: A Distributed Economic Meta-Scheduler for the Federation of Grid and Cloud Systems

    No full text
    The computational landscape is littered with islands of disjoint resource providers including commercial Clouds, private Clouds, national Grids, institutional Grids, clusters, and data centers. These providers are independent and isolated due to a lack of communication and coordination, they are also often proprietary without standardised interfaces, protocols, or execution environments. The lack of standardisation and global transparency has the effect of binding consumers to individual providers. With the increasing ubiquity of computation providers there is an opportunity to create federated architectures that span both Grid and Cloud computing providers effectively creating a global computing infrastructure. In order to realise this vision, secure and scalable mechanisms to coordinate resource access are required. This thesis proposes a generic meta-scheduling architecture to facilitate federated resource allocation in which users can provision resources from a range of heterogeneous (service) providers. Efficient resource allocation is difficult in large scale distributed environments due to the inherent lack of centralised control. In a Grid model, local resource managers govern access to a pool of resources within a single administrative domain but have only a local view of the Grid and are unable to collaborate when allocating jobs. Meta-schedulers act at a higher level able to submit jobs to multiple resource managers, however they are most often deployed on a per-client basis and are therefore concerned with only their allocations, essentially competing against one another. In a federated environment the widespread adoption of utility computing models seen in commercial Cloud providers has re-motivated the need for economically aware meta-schedulers. Economies provide a way to represent the different goals and strategies that exist in a competitive distributed environment. The use of economic allocation principles effectively creates an open service market that provides efficient allocation and incentives for participation. The major contributions of this thesis are the architecture and prototype implementation of the DRIVE meta-scheduler. DRIVE is a Virtual Organisation (VO) based distributed economic metascheduler in which members of the VO collaboratively allocate services or resources. Providers joining the VO contribute obligation services to the VO. These contributed services are in effect membership “dues” and are used in the running of the VOs operations – for example allocation, advertising, and general management. DRIVE is independent from a particular class of provider (Service, Grid, or Cloud) or specific economic protocol. This independence enables allocation in federated environments composed of heterogeneous providers in vastly different scenarios. Protocol independence facilitates the use of arbitrary protocols based on specific requirements and infrastructural availability. For instance, within a single organisation where internal trust exists, users can achieve maximum allocation performance by choosing a simple economic protocol. In a global utility Grid no such trust exists. The same meta-scheduler architecture can be used with a secure protocol which ensures the allocation is carried out fairly in the absence of trust. DRIVE establishes contracts between participants as the result of allocation. A contract describes individual requirements and obligations of each party. A unique two stage contract negotiation protocol is used to minimise the effect of allocation latency. In addition due to the co-op nature of the architecture and the use of secure privacy preserving protocols, DRIVE can be deployed in a distributed environment without requiring large scale dedicated resources. This thesis presents several other contributions related to meta-scheduling and open service markets. To overcome the perceived performance limitations of economic systems four high utilisation strategies have been developed and evaluated. Each strategy is shown to improve occupancy, utilisation and profit using synthetic workloads based on a production Grid trace. The gRAVI service wrapping toolkit is presented to address the difficulty web enabling existing applications. The gRAVI toolkit has been extended for this thesis such that it creates economically aware (DRIVE-enabled) services that can be transparently traded in a DRIVE market without requiring developer input. The final contribution of this thesis is the definition and architecture of a Social Cloud – a dynamic Cloud computing infrastructure composed of virtualised resources contributed by members of a Social network. The Social Cloud prototype is based on DRIVE and highlights the ease in which dynamic DRIVE markets can be created and used in different domains
    corecore