5 research outputs found
Efficient adaptive query processing on large database systems available in the cloud environment
Tese de Doutoramento em InformáticaNowadays, many companies are migrating their applications and data to cloud service
providers, mainly because of their ability to answer quickly to business requirements.
Thereby, the performance is an important requirement for most customers when they
wish to migrate their applications to the cloud.
Therefore, in cloud environments, resources should be acquired and released
automatically and quickly at runtime. Moreover, the users and service providers expect
to get answers in time to ensure the service SLA (Service Level Agreement).
Consequently, ensuring the QoS (Quality of Service) is a great challenge and it
increases when we have large amounts of data to be manipulated in this environment.
To resolve this kind of problems, several researches have been focused on shorter
execution time using adaptive query processing and/or prediction of resources based
on current system status. However, they present important limitations. For example,
most of these works does not use monitoring during query execution and/or presents
intrusive solutions, i.e. applied to the particular context.
The aim of this thesis is the development of new solutions/strategies to efficient
adaptive query processing on large databases available in a cloud environment. It must
integrate adaptive re-optimization at query runtime and their costs are based on the
SRT (Service Response Time – SLA QoS performance parameter). Finally, the proposed
solution will be evaluated on large scale with large volume of data, machines and
queries in a cloud computing infrastructure.
Finally, this work also proposes a new model to estimate the SRT for different request
types (database access requests). This model will allow the cloud service provider and
its customers to establish an appropriate SLA relative to the expected performance of
the services available in the cloud.Atualmente, muitas companhias têm migrado suas aplicações e dados para
fornecedores de serviços em nuvem, pois um dos principais benefÃcios dessa
tecnologia é a capacidade de responder rapidamente às necessidades do negócio.
Assim, o desempenho é um dos mais importantes requisitos para a maioria dos
clientes que desejam migrar suas aplicações para a nuvem.
Em ambiente de nuvem, os recursos devem ser adquiridos e libertados
automaticamente e rapidamente em tempo de execução. Além disso, os utilizadores e
fornecedores de serviços esperam sempre garantir o contrato SLA (Acordo de NÃvel de
Serviço). Consequentemente, garantir o QoS (Qualidade de Serviço) é um grande
desafio, que se torna mais complexo quando existe uma grande quantidade de dados a
serem manipulados neste ambiente.
Para resolver estes tipos de problemas, diversas pesquisas têm sido realizadas
focando o menor tempo de execução dos pedidos do utilizador na nuvem usando
técnicas de processamento adaptativo de consultas e/ou utilizando técnicas de
predição de recursos baseados no estado atual do sistema. Contudo, esses trabalhos
apresentam limitações importantes. Por exemplo, a maioria desses trabalhos não
utiliza monitorazação durante a execução da consulta e/ou apresenta soluções
intrusivas, isto é, aplicadas a um contexto particular.
Portanto, o objetivo desta tese consiste no desenvolvimento de uma nova
solução/estratégia para o processamento eficiente (adaptativo) de consultas sobre
grandes bases de dados disponÃveis em ambiente de nuvem. Ela irá integrar técnicas
de otimização adaptativas em tempo de execução da consulta e seus custos são
baseados no SRT (Tempo de Resposta do Serviço – parâmetro QoS de desempenho do
SLA). A solução proposta será avaliada em larga escala utilizando uma grande base de
dados, máquinas e consultas em um ambiente real de computação na nuvem.
Finalmente, este trabalho também propõe um novo modelo para estimar o SRT para
diferentes tipos de pedidos (pedidos de acesso a banco de dados). Este modelo
permitirá que um fornecedor de serviços em nuvem e seus clientes possam
estabelecer um contrato SLA adequado, relativo ao desempenho esperado dos serviços
disponÃveis em nuvem
Processing Exact Results for Queries over Data Streams
In a growing number of information-processing applications, such as network-traffic monitoring, sensor networks, financial analysis, data mining for e-commerce, etc., data takes the form of continuous data streams rather than traditional stored databases/relational tuples. These applications have some common features like the need for real time analysis, huge volumes of data, and unpredictable and bursty arrivals of stream elements. In all of these applications, it is infeasible to process queries over data streams by loading the data into a traditional database management system (DBMS) or into main memory. Such an approach does not scale with high stream rates. As a consequence, systems that can manage streaming data have gained tremendous importance. The need to process a large number of continuous queries over bursty, high volume online data streams, potentially in real time, makes it imperative to design algorithms that should use limited resources.
This dissertation focuses on processing exact results for join queries over high speed data streams using limited resources, and proposes several novel techniques for processing join queries incorporating secondary storages and non-dedicated computers. Existing approaches for stream joins either, (a) deal with memory limitations by shedding loads, and therefore can not produce exact or highly accurate results for the stream joins over data streams with time varying arrivals of stream tuples, or (b) suffer from large I/O-overheads due to random disk accesses. The proposed techniques exploit the high bandwidth of a disk subsystem by rendering the data access pattern largely sequential, eliminating small, random disk accesses. This dissertation proposes an I/O-efficient algorithm to process hybrid join queries, that join a fast, time varying or bursty data stream and a persistent disk relation. Such a hybrid join is the crux of a number of common transformations in an active data warehouse. Experimental results demonstrate that the proposed scheme reduces the response time in output results by exploiting spatio-temporal locality within the input stream, and minimizes disk overhead through disk-I/O amortization.
The dissertation also proposes an algorithm to parallelize a stream join operator over a shared-nothing system. The proposed algorithm distributes the processing loads across a number of independent, non-dedicated nodes, based on a fixed or predefined communication pattern; dynamically maintains the degree of declustering in order to minimize communication and processing overheads; and presents mechanisms for reducing storage and communication overheads while scaling over a large number of nodes. We present experimental results showing the efficacy of the proposed algorithms