5 research outputs found

    Fast and Cost-Effective Online Load-Balancing in Distributed Range-Queriable Systems

    Full text link

    Efficient adaptive query processing on large database systems available in the cloud environment

    Get PDF
    Tese de Doutoramento em InformáticaNowadays, many companies are migrating their applications and data to cloud service providers, mainly because of their ability to answer quickly to business requirements. Thereby, the performance is an important requirement for most customers when they wish to migrate their applications to the cloud. Therefore, in cloud environments, resources should be acquired and released automatically and quickly at runtime. Moreover, the users and service providers expect to get answers in time to ensure the service SLA (Service Level Agreement). Consequently, ensuring the QoS (Quality of Service) is a great challenge and it increases when we have large amounts of data to be manipulated in this environment. To resolve this kind of problems, several researches have been focused on shorter execution time using adaptive query processing and/or prediction of resources based on current system status. However, they present important limitations. For example, most of these works does not use monitoring during query execution and/or presents intrusive solutions, i.e. applied to the particular context. The aim of this thesis is the development of new solutions/strategies to efficient adaptive query processing on large databases available in a cloud environment. It must integrate adaptive re-optimization at query runtime and their costs are based on the SRT (Service Response Time – SLA QoS performance parameter). Finally, the proposed solution will be evaluated on large scale with large volume of data, machines and queries in a cloud computing infrastructure. Finally, this work also proposes a new model to estimate the SRT for different request types (database access requests). This model will allow the cloud service provider and its customers to establish an appropriate SLA relative to the expected performance of the services available in the cloud.Atualmente, muitas companhias têm migrado suas aplicações e dados para fornecedores de serviços em nuvem, pois um dos principais benefícios dessa tecnologia é a capacidade de responder rapidamente às necessidades do negócio. Assim, o desempenho é um dos mais importantes requisitos para a maioria dos clientes que desejam migrar suas aplicações para a nuvem. Em ambiente de nuvem, os recursos devem ser adquiridos e libertados automaticamente e rapidamente em tempo de execução. Além disso, os utilizadores e fornecedores de serviços esperam sempre garantir o contrato SLA (Acordo de Nível de Serviço). Consequentemente, garantir o QoS (Qualidade de Serviço) é um grande desafio, que se torna mais complexo quando existe uma grande quantidade de dados a serem manipulados neste ambiente. Para resolver estes tipos de problemas, diversas pesquisas têm sido realizadas focando o menor tempo de execução dos pedidos do utilizador na nuvem usando técnicas de processamento adaptativo de consultas e/ou utilizando técnicas de predição de recursos baseados no estado atual do sistema. Contudo, esses trabalhos apresentam limitações importantes. Por exemplo, a maioria desses trabalhos não utiliza monitorazação durante a execução da consulta e/ou apresenta soluções intrusivas, isto é, aplicadas a um contexto particular. Portanto, o objetivo desta tese consiste no desenvolvimento de uma nova solução/estratégia para o processamento eficiente (adaptativo) de consultas sobre grandes bases de dados disponíveis em ambiente de nuvem. Ela irá integrar técnicas de otimização adaptativas em tempo de execução da consulta e seus custos são baseados no SRT (Tempo de Resposta do Serviço – parâmetro QoS de desempenho do SLA). A solução proposta será avaliada em larga escala utilizando uma grande base de dados, máquinas e consultas em um ambiente real de computação na nuvem. Finalmente, este trabalho também propõe um novo modelo para estimar o SRT para diferentes tipos de pedidos (pedidos de acesso a banco de dados). Este modelo permitirá que um fornecedor de serviços em nuvem e seus clientes possam estabelecer um contrato SLA adequado, relativo ao desempenho esperado dos serviços disponíveis em nuvem

    Processing Exact Results for Queries over Data Streams

    Get PDF
    In a growing number of information-processing applications, such as network-traffic monitoring, sensor networks, financial analysis, data mining for e-commerce, etc., data takes the form of continuous data streams rather than traditional stored databases/relational tuples. These applications have some common features like the need for real time analysis, huge volumes of data, and unpredictable and bursty arrivals of stream elements. In all of these applications, it is infeasible to process queries over data streams by loading the data into a traditional database management system (DBMS) or into main memory. Such an approach does not scale with high stream rates. As a consequence, systems that can manage streaming data have gained tremendous importance. The need to process a large number of continuous queries over bursty, high volume online data streams, potentially in real time, makes it imperative to design algorithms that should use limited resources. This dissertation focuses on processing exact results for join queries over high speed data streams using limited resources, and proposes several novel techniques for processing join queries incorporating secondary storages and non-dedicated computers. Existing approaches for stream joins either, (a) deal with memory limitations by shedding loads, and therefore can not produce exact or highly accurate results for the stream joins over data streams with time varying arrivals of stream tuples, or (b) suffer from large I/O-overheads due to random disk accesses. The proposed techniques exploit the high bandwidth of a disk subsystem by rendering the data access pattern largely sequential, eliminating small, random disk accesses. This dissertation proposes an I/O-efficient algorithm to process hybrid join queries, that join a fast, time varying or bursty data stream and a persistent disk relation. Such a hybrid join is the crux of a number of common transformations in an active data warehouse. Experimental results demonstrate that the proposed scheme reduces the response time in output results by exploiting spatio-temporal locality within the input stream, and minimizes disk overhead through disk-I/O amortization. The dissertation also proposes an algorithm to parallelize a stream join operator over a shared-nothing system. The proposed algorithm distributes the processing loads across a number of independent, non-dedicated nodes, based on a fixed or predefined communication pattern; dynamically maintains the degree of declustering in order to minimize communication and processing overheads; and presents mechanisms for reducing storage and communication overheads while scaling over a large number of nodes. We present experimental results showing the efficacy of the proposed algorithms
    corecore