25 research outputs found

    Virtual Cluster Management for Analysis of Geographically Distributed and Immovable Data

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2015Scenarios exist in the era of Big Data where computational analysis needs to utilize widely distributed and remote compute clusters, especially when the data sources are sensitive or extremely large, and thus unable to move. A large dataset in Malaysia could be ecologically sensitive, for instance, and unable to be moved outside the country boundaries. Controlling an analysis experiment in this virtual cluster setting can be difficult on multiple levels: with setup and control, with managing behavior of the virtual cluster, and with interoperability issues across the compute clusters. Further, datasets can be distributed among clusters, or even across data centers, so that it becomes critical to utilize data locality information to optimize the performance of data-intensive jobs. Finally, datasets are increasingly sensitive and tied to certain administrative boundaries, though once the data has been processed, the aggregated or statistical result can be shared across the boundaries. This dissertation addresses management and control of a widely distributed virtual cluster having sensitive or otherwise immovable data sets through a controller. The Virtual Cluster Controller (VCC) gives control back to the researcher. It creates virtual clusters across multiple cloud platforms. In recognition of sensitive data, it can establish a single network overlay over widely distributed clusters. We define a novel class of data, notably immovable data that we call "pinned data", where the data is treated as a first-class citizen instead of being moved to where needed. We draw from our earlier work with a hierarchical data processing model, Hierarchical MapReduce (HMR), to process geographically distributed data, some of which are pinned data. The applications implemented in HMR use extended MapReduce model where computations are expressed as three functions: Map, Reduce, and GlobalReduce. Further, by facilitating information sharing among resources, applications, and data, the overall performance is improved. Experimental results show that the overhead of VCC is minimum. The HMR outperforms traditional MapReduce model while processing a particular class of applications. The evaluations also show that information sharing between resources and application through the VCC shortens the hierarchical data processing time, as well satisfying the constraints on the pinned data

    Design and Evaluation of Opal2: A Toolkit for Scientific Software as a Service

    Full text link
    Abstract—Grid computing provides mechanisms for making large-scale computing environments available to the masses. In recent times, with the advent of Cloud computing, the concepts of Software as a Service (SaaS), where vendors provide key software products as services over the internet that can be accessed by users to perform complex tasks, and Service as Software (SaS), where customizable and repeatable services are packaged as software products that dynamically meet the demands of individual users, have become increasingly popular. Both SaaS and SaS models are highly applicable to scientific software and users alike. Opal2 is a toolkit for wrapping scientific applications as Web services on Grid and cloud computing resources. It provides a mechanism for scientific application developers to expose the functionality of their codes via simple Web service APIs, abstracting out the details of the back-end infrastructure. Services may be combined via cus-tomized workflows for specific research areas and distributed as virtual machine images. In this paper, we describe the overall philosophy and architecture of the Opal2 framework, including its new plug-in architecture and data handling capabilities. We analyze its performance in typical cluster and Grid settings, and in a cloud computing environment within virtual machines

    Um Estudo aplicado de linha de produtos de software em um ambiente computacional distribuído

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Ciência da ComputaçãoProjetos de software em geral tendem a buscar cada vez mais a reutilização e a componentização, visando à economia de tempo, custo e recursos de novos produtos. Sendo assim, a necessidade de técnicas e ferramentas para organizar projetos de maior qualidade em menor tempo é um dos grandes desafios da Engenharia de Software. Com isso, a Linha de Produtos de Software (LPS) se propõe a organizar e auxiliar sistematicamente o desenvolvimento de novos produtos em série em um mesmo domínio. Nesse contexto, o presente trabalho de pesquisa objetiva aplicar a abordagem de Linha de Produtos de Software em um ambiente computacional distribuído, visto que, em projetos envolvendo ambientes distribuídos, novas versões de um produto com evolução de suas características no mesmo domínio repetem e não reutilizam os principais artefatos, tais como arquitetura e componentes. A Linha de Produtos de Software pode evidenciar através de pontos de variação quais serão os locais de evolução bem como quais farão parte da arquitetura principal. Assim, o objetivo da abordagem levantada nesta dissertação é analisar um processo atual utilizado no Laboratório de Pesquisa em Sistemas Distribuídos (LaPeSD) e propor uma nova abordagem utilizando Linha de Produtos de Software para desenvolver projetos. Dessa forma, busca-se uma nova abordagem para desenvolver o projeto reutilizando toda uma arquitetura, componentes e documentos já prontos, partindo de uma base sólida e criando novos produtos com foco nas novas funcionalidades. Como resultado dessa proposta, apresentam-se uma arquitetura e componentes reutilizáveis, além de maior organização e visibilidade, pois se entende que, com a aplicação dessa abordagem, se atinge com sucesso o desafio de aplicar o uso de Linha de Produtos de Software no Ambiente Computacional Distribuído

    Uma Abordagem para reserva antecipada de recursos em ambientes de grades computacionais móveis

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2011As grades computacionais são amplamente utilizadas para a resolução de problemas que demandam um grande poder computacional. Os dispositivos móveis, por apresentarem cada vez mais recursos e maior capacidade de processamento, têm sido utilizados em ambientes de grades. Com o crescente acesso a tais ambientes, a qualquer momento e localização, deseja-se que o usuário tenha o mínimo conhecimento sobre o ambiente, preocupando-se apenas com as características de sua aplicação. Nesses ambientes, em caso de indisponibilidade de recursos que atendam às requisições, as aplicações são colocadas em filas, não ocorrendo garantias de execução. A reserva antecipada de recursos, nesse contexto, é um mecanismo importante, pois permite um melhor planejamento de uso da grade, garantindo um maior aproveitamento na utilização dos recursos. Através deste mecanismo, um usuário pode requisitar um futuro uso de recursos, a fim de garantir maiores níveis de QoS e de QoE. Nesta dissertação é apresentada uma arquitetura para reserva antecipada de recursos, considerando as características da aplicação como fator determinante para a reserva. Especificamente, a abordagem proposta visa melhorar a qualidade das reservas, procurando o melhor nível de adequação dos recursos com base nas particularidades requisitadas pelo usuário. Assim, as reservas realizadas, além de garantirem uma QoS, buscam melhorar o desempenho durante a execução das tarefas. A arquitetura proposta ainda apresenta uma interface móvel de acesso para os usuários interagirem através de dispositivos móveis. Nos experimentos realizados, a arquitetura, quando comparada com outras abordagens, mostrou ser eficiente. Realizou uma boa distribuição das reservas, alcançando uma maior eficiência computacional e garantindo um bom desempenho das aplicações executadas nos recursos previamente reservados.Grid computing is widely used to solve problems that require high computing power. Mobile devices have been used in grid environments due to their increasing number of resources and growing processing power. The increasing accesses to these environments at anytime and anywhere requires the least knowledge from the user who only has to worry about the characteristics of his application in the grid environment. In such environments, in case of unavailability of resources to meet the requests, applications are placed in queues without executions guarantees. In this context, advanced reservation is an important mechanism that enables better planning use of the grid by ensuring a better use of its resources. Through this mechanism, a user can request a future use of resources in order to ensure higher levels of QoS and QoE. This dissertation presents an architecture for advanced resource reservation that considers the application characteristics as the major factor for the reservations. Specifically, the proposed approach aims to improve the reservation quality, seeking the highest levels of adequacy of resources based on specifics user requirements. Besides ensuring QoS, the performed reservations also aims to improve performance during a job execution. The proposed architecture also presents a mobile access interface for users to interact with the grid through mobile devices. In the experiments, the architecture has shown to be efficient when compared with other approaches. Our approach performed a good distribution of reserves, achieving greater computational efficiency by ensuring a good performance of applications running on the resources reserved in advance

    Reserva dinâmica e antecipada de recursos para configurações multi-clusters utilizando ontologias e lógica difusa

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2010A reserva antecipada de recursos é um mecanismo importante para garantir maior aproveitamento na utilização dos recursos disponíveis em ambientes multi-clusters distribuídos. Este mecanismo permite, por exemplo, que um usuário forneça parâmetros com o objetivo de satisfazer determinados requisitos no momento da execução de uma aplicação. Esta previsibilidade permite que o sistema alcance maiores níveis de QoS. Entretanto, a complexidade das configurações de larga escala e as mudanças dinâmicas verificadas nesses sistemas limitam o suporte à reserva antecipada. O presente trabalho de pesquisa caracteriza-se pela proposta de uma abordagem de reserva antecipada utilizando os paradigmas de ontologias e lógica difusa. Esta proposta permite que uma aplicação reserve mais de um cluster por tarefa, e, também, que requisite uma grande diversidade de recursos. Em adição, a disponibilidade local dos recursos é verificada localmente de forma dinâmica, evitando conflitos futuros no momento da alocação. Foram realizadas comparações relativas a outros trabalhos e os resultados experimentais, utilizando simulações de configurações de multi-clusters, indicam que o mecanismo proposto alcançou com sucesso flexibilidade para tarefas que requisitaram alto processamento, permitindo um balanceamento adequado de processos entre clusters

    DRIVE: A Distributed Economic Meta-Scheduler for the Federation of Grid and Cloud Systems

    No full text
    The computational landscape is littered with islands of disjoint resource providers including commercial Clouds, private Clouds, national Grids, institutional Grids, clusters, and data centers. These providers are independent and isolated due to a lack of communication and coordination, they are also often proprietary without standardised interfaces, protocols, or execution environments. The lack of standardisation and global transparency has the effect of binding consumers to individual providers. With the increasing ubiquity of computation providers there is an opportunity to create federated architectures that span both Grid and Cloud computing providers effectively creating a global computing infrastructure. In order to realise this vision, secure and scalable mechanisms to coordinate resource access are required. This thesis proposes a generic meta-scheduling architecture to facilitate federated resource allocation in which users can provision resources from a range of heterogeneous (service) providers. Efficient resource allocation is difficult in large scale distributed environments due to the inherent lack of centralised control. In a Grid model, local resource managers govern access to a pool of resources within a single administrative domain but have only a local view of the Grid and are unable to collaborate when allocating jobs. Meta-schedulers act at a higher level able to submit jobs to multiple resource managers, however they are most often deployed on a per-client basis and are therefore concerned with only their allocations, essentially competing against one another. In a federated environment the widespread adoption of utility computing models seen in commercial Cloud providers has re-motivated the need for economically aware meta-schedulers. Economies provide a way to represent the different goals and strategies that exist in a competitive distributed environment. The use of economic allocation principles effectively creates an open service market that provides efficient allocation and incentives for participation. The major contributions of this thesis are the architecture and prototype implementation of the DRIVE meta-scheduler. DRIVE is a Virtual Organisation (VO) based distributed economic metascheduler in which members of the VO collaboratively allocate services or resources. Providers joining the VO contribute obligation services to the VO. These contributed services are in effect membership “dues” and are used in the running of the VOs operations – for example allocation, advertising, and general management. DRIVE is independent from a particular class of provider (Service, Grid, or Cloud) or specific economic protocol. This independence enables allocation in federated environments composed of heterogeneous providers in vastly different scenarios. Protocol independence facilitates the use of arbitrary protocols based on specific requirements and infrastructural availability. For instance, within a single organisation where internal trust exists, users can achieve maximum allocation performance by choosing a simple economic protocol. In a global utility Grid no such trust exists. The same meta-scheduler architecture can be used with a secure protocol which ensures the allocation is carried out fairly in the absence of trust. DRIVE establishes contracts between participants as the result of allocation. A contract describes individual requirements and obligations of each party. A unique two stage contract negotiation protocol is used to minimise the effect of allocation latency. In addition due to the co-op nature of the architecture and the use of secure privacy preserving protocols, DRIVE can be deployed in a distributed environment without requiring large scale dedicated resources. This thesis presents several other contributions related to meta-scheduling and open service markets. To overcome the perceived performance limitations of economic systems four high utilisation strategies have been developed and evaluated. Each strategy is shown to improve occupancy, utilisation and profit using synthetic workloads based on a production Grid trace. The gRAVI service wrapping toolkit is presented to address the difficulty web enabling existing applications. The gRAVI toolkit has been extended for this thesis such that it creates economically aware (DRIVE-enabled) services that can be transparently traded in a DRIVE market without requiring developer input. The final contribution of this thesis is the definition and architecture of a Social Cloud – a dynamic Cloud computing infrastructure composed of virtualised resources contributed by members of a Social network. The Social Cloud prototype is based on DRIVE and highlights the ease in which dynamic DRIVE markets can be created and used in different domains

    Applications Development for the Computational Grid

    Get PDF

    Negotiated resource brokering for quality of service provision of grid applications

    Get PDF
    Grid Computing is a distributed computing paradigm where many computers often formed from different organisations work together so that their computing power may be aggregated. Grids are often heterogeneous and resources vary significantly in CPU power, available RAM, disk space, OS, architecture and installed software etc. Added to this lack of uniformity is that best effort services are usually offered, as opposed to services that offer guarantees upon completion time via the use of Service Level Agreements (SLAs). The lack of guarantees means the uptake of Grids is stifled. The challenge tackled here is to add such guarantees, thus ensuring users are more willing to use the Grid given an obvious reluctance to pay or contribute, if the quality of the services returned lacks any guarantees. Grids resources are also finite in nature, hence priorities need establishing in order to best meet any guarantees placed upon the limited resources available. An economic approach is hence adopted to ensure end users reveal their true priorities for jobs, whilst also adding incentive for provisioning services, via a service charge. An economically oriented model is therefore proposed that provides SLAs with bicriteria constraints upon time and cost. This model is tested via discrete event simulation and a simulator is presented that is capable of testing the model. An architecture is then established that was developed to utilise the economic model for negotiating SLAs. Finally experimentation is reported upon from the use of the software developed when it was deployed upon a testbed, including admission control and steering of jobs within the Grid. Results are presented that show the interactions and relationship between the time and cost constraints within the model, including transitions between the dominance of one constraint over the other and other things such as the effects of rescheduling upon the market
    corecore