1,368 research outputs found
Arquitetura de elevada disponibilidade para bases de dados na cloud
Dissertação de mestrado em Computer ScienceCom a constante expansão de sistemas informáticos nas diferentes áreas de aplicação, a
quantidade de dados que exigem persistência aumenta exponencialmente. Assim, por
forma a tolerar faltas e garantir a disponibilidade de dados, devem ser implementadas
técnicas de replicação.
Atualmente existem várias abordagens e protocolos, tendo diferentes tipos de aplicações
em vista. Existem duas grandes vertentes de protocolos de replicação, protocolos genéricos,
para qualquer serviço, e protocolos específicos destinados a bases de dados. No que toca
a protocolos de replicação genéricos, as principais técnicas existentes, apesar de completa mente desenvolvidas e em utilização, têm algumas limitações, nomeadamente: problemas
de performance relativamente a saturação da réplica primária na replicação passiva e o
determinismo necessário associado à replicação ativa. Algumas destas desvantagens são
mitigadas pelos protocolos específicos de base de dados (e.g., com recurso a multi-master)
mas estes protocolos não permitem efetuar uma separação entre a lógica da replicação e
os respetivos dados. Abordagens mais recentes tendem a basear-se em técnicas de repli cação com fundamentos em mecanismos distribuídos de logging. Tais mecanismos propor cionam alta disponibilidade de dados e tolerância a faltas, permitindo abordagens inovado ras baseadas puramente em logs.
Por forma a atenuar as limitações encontradas não só no mecanismo de replicação ativa
e passiva, mas também nas suas derivações, esta dissertação apresenta uma solução de
replicação híbrida baseada em middleware, o SQLware. A grande vantagem desta abor dagem baseia-se na divisão entre a camada de replicação e a camada de dados, utilizando
um log distribuído altamente escalável que oferece tolerância a faltas e alta disponibilidade.
O protótipo desenvolvido foi validado com recurso à execução de testes de desempenho,
sendo avaliado em duas infraestruturas diferentes, nomeadamente, um servidor privado
de média gama e um grupo de servidores de computação de alto desempenho. Durante a
avaliação do protótipo, o standard da indústria TPC-C, tipicamente utilizado para avaliar
sistemas de base de dados transacionais, foi utilizado. Os resultados obtidos demonstram
que o SQLware oferece uma aumento de throughput de 150 vezes, comparativamente ao
mecanismo de replicação nativo da base de dados considerada, o PostgreSQL.With the constant expansion of computational systems, the amount of data that requires
durability increases exponentially. All data persistence must be replicated in order to provide high-availability and fault tolerance according to the surrogate application or use-case.
Currently, there are numerous approaches and replication protocols developed supporting different use-cases. There are two prominent variations of replication protocols, generic
protocols, and database specific ones. The two main techniques associated with generic
replication protocols are the active and passive replication. Although generic replication
techniques are fully matured and widely used, there are inherent problems associated with
those protocols, namely: performance issues of the primary replica of passive replication
and the determinism required by the active replication. Some of those disadvantages are
mitigated by specific database replication protocols (e.g., using multi-master) but, those
protocols do not allow a separation between logic and data and they can not be decoupled
from the database engine. Moreover, recent strategies consider highly-scalable and fault tolerant distributed logging mechanisms, allowing for newer designs based purely on logs
to power replication.
To mitigate the shortcomings found in both active and passive replication mechanisms,
but also in partial variations of these methods, this dissertation presents a hybrid replication middleware, SQLware. The cornerstone of the approach lies in the decoupling between
the logical replication layer and the data store, together with the use of a highly scalable distributed log that provides fault-tolerance and high-availability. We validated the prototype
by conducting a benchmarking campaign to evaluate the overall system’s performance under two distinct infrastructures, namely a private medium class server, and a private high
performance computing cluster. Across the evaluation campaign, we considered the TPCC benchmark, a widely used benchmark in the evaluation of Online transaction processing
(OLTP) database systems. Results show that SQLware was able to achieve 150 times more
throughput when compared with the native replication mechanism of the underlying data
store considered as baseline, PostgreSQL.This work was partially funded by FCT - Fundação para a Ciência e a Tecnologia, I.P.,
(Portuguese Foundation for Science and Technology) within project UID/EEA/50014/201
Scalable and Highly Available Database Systems in the Cloud
Cloud computing allows users to tap into a massive pool of shared computing
resources such as servers, storage, and network. These resources are provided as a
service to the users allowing them to “plug into the cloud” similar to a utility grid.
The promise of the cloud is to free users from the tedious and often complex task of
managing and provisioning computing resources to run applications. At the same
time, the cloud brings several additional benefits including: a pay-as-you-go cost
model, easier deployment of applications, elastic scalability, high availability, and
a more robust and secure infrastructure.
One important class of applications that users are increasingly deploying in
the cloud is database management systems. Database management systems differ
from other types of applications in that they manage large amounts of state that
is frequently updated, and that must be kept consistent at all scales and in the
presence of failure. This makes it difficult to provide scalability and high availability
for database systems in the cloud. In this thesis, we show how we can exploit
cloud technologies and relational database systems to provide a highly available
and scalable database service in the cloud.
The first part of the thesis presents RemusDB, a reliable, cost-effective high
availability solution that is implemented as a service provided by the virtualization
platform. RemusDB can make any database system highly available with little or
no code modifications by exploiting the capabilities of virtualization. In the second
part of the thesis, we present two systems that aim to provide elastic scalability
for database systems in the cloud using two very different approaches. The three
systems presented in this thesis bring us closer to the goal of building a scalable
and reliable transactional database service in the cloud
LogBase: A Scalable Log-structured Database System in the Cloud
Numerous applications such as financial transactions (e.g., stock trading)
are write-heavy in nature. The shift from reads to writes in web applications
has also been accelerating in recent years. Write-ahead-logging is a common
approach for providing recovery capability while improving performance in most
storage systems. However, the separation of log and application data incurs
write overheads observed in write-heavy environments and hence adversely
affects the write throughput and recovery time in the system. In this paper, we
introduce LogBase - a scalable log-structured database system that adopts
log-only storage for removing the write bottleneck and supporting fast system
recovery. LogBase is designed to be dynamically deployed on commodity clusters
to take advantage of elastic scaling property of cloud environments. LogBase
provides in-memory multiversion indexes for supporting efficient access to data
maintained in the log. LogBase also supports transactions that bundle read and
write operations spanning across multiple records. We implemented the proposed
system and compared it with HBase and a disk-based log-structured
record-oriented system modeled after RAMCloud. The experimental results show
that LogBase is able to provide sustained write throughput, efficient data
access out of the cache, and effective system recovery.Comment: VLDB201
Clouder : a flexible large scale decentralized object store
Programa Doutoral em Informática MAP-iLarge scale data stores have been initially introduced to support a few concrete extreme
scale applications such as social networks. Their scalability and availability
requirements often outweigh sacrificing richer data and processing models, and even
elementary data consistency. In strong contrast with traditional relational databases
(RDBMS), large scale data stores present very simple data models and APIs, lacking
most of the established relational data management operations; and relax consistency
guarantees, providing eventual consistency.
With a number of alternatives now available and mature, there is an increasing
willingness to use them in a wider and more diverse spectrum of applications, by
skewing the current trade-off towards the needs of common business users, and easing
the migration from current RDBMS. This is particularly so when used in the context
of a Cloud solution such as in a Platform as a Service (PaaS).
This thesis aims at reducing the gap between traditional RDBMS and large scale
data stores, by seeking mechanisms to provide additional consistency guarantees and
higher level data processing primitives in large scale data stores. The devised mechanisms
should not hinder the scalability and dependability of large scale data stores.
Regarding, higher level data processing primitives this thesis explores two complementary
approaches: by extending data stores with additional operations such as general
multi-item operations; and by coupling data stores with other existent processing
facilities without hindering scalability.
We address this challenges with a new architecture for large scale data stores, efficient
multi item access for large scale data stores, and SQL processing atop large scale
data stores. The novel architecture allows to find the right trade-offs among flexible
usage, efficiency, and fault-tolerance. To efficient support multi item access we extend first generation large scale data store’s data models with tags and a multi-tuple data
placement strategy, that allow to efficiently store and retrieve large sets of related data
at once. For efficient SQL support atop scalable data stores we devise design modifications
to existing relational SQL query engines, allowing them to be distributed.
We demonstrate our approaches with running prototypes and extensive experimental
evaluation using proper workloads.Os sistemas de armazenamento de dados de grande escala foram inicialmente desenvolvidos
para suportar um leque restrito de aplicacões de escala extrema, como as
redes sociais. Os requisitos de escalabilidade e elevada disponibilidade levaram a
sacrificar modelos de dados e processamento enriquecidos e até a coerência dos dados.
Em oposição aos tradicionais sistemas relacionais de gestão de bases de dados
(SRGBD), os sistemas de armazenamento de dados de grande escala apresentam modelos
de dados e APIs muito simples. Em particular, evidenciasse a ausência de muitas
das conhecidas operacões de gestão de dados relacionais e o relaxamento das garantias
de coerência, fornecendo coerência futura.
Atualmente, com o número de alternativas disponíveis e maduras, existe o crescente
interesse em usá-los num maior e diverso leque de aplicacões, orientando o atual
compromisso para as necessidades dos típicos clientes empresariais e facilitando a
migração a partir das atuais SRGBD. Isto é particularmente importante no contexto de
soluções cloud como plataformas como um servic¸o (PaaS).
Esta tese tem como objetivo reduzir a diferencça entre os tradicionais SRGDBs e os
sistemas de armazenamento de dados de grande escala, procurando mecanismos que
providenciem garantias de coerência mais fortes e primitivas com maior capacidade de
processamento. Os mecanismos desenvolvidos não devem comprometer a escalabilidade
e fiabilidade dos sistemas de armazenamento de dados de grande escala. No que
diz respeito às primitivas com maior capacidade de processamento esta tese explora
duas abordagens complementares : a extensão de sistemas de armazenamento de dados
de grande escala com operacões genéricas de multi objeto e a junção dos sistemas de armazenamento de dados de grande escala com mecanismos existentes de processamento
e interrogac¸ ˜ao de dados, sem colocar em causa a escalabilidade dos mesmos.
Para isso apresent´amos uma nova arquitetura para os sistemas de armazenamento
de dados de grande escala, acesso eficiente a m´ultiplos objetos, e processamento de
SQL sobre sistemas de armazenamento de dados de grande escala. A nova arquitetura
permite encontrar os compromissos adequados entre flexibilidade, eficiˆencia e
tolerˆancia a faltas. De forma a suportar de forma eficiente o acesso a m´ultiplos objetos
estendemos o modelo de dados de sistemas de armazenamento de dados de grande escala
da primeira gerac¸ ˜ao com palavras-chave e definimos uma estrat´egia de colocac¸ ˜ao
de dados para m´ultiplos objetos que permite de forma eficiente armazenar e obter
grandes quantidades de dados de uma s´o vez. Para o suporte eficiente de SQL sobre
sistemas de armazenamento de dados de grande escala, analisámos a arquitetura dos
motores de interrogação de SRGBDs e fizemos alterações que permitem que sejam
distribuídos.
As abordagens propostas são demonstradas através de protótipos e uma avaliacão
experimental exaustiva recorrendo a cargas adequadas baseadas em aplicações reais
Partial replication in distributed software transactional memory
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaDistributed software transactional memory (DSTM) is emerging as an interesting alternative for distributed concurrency control. Usually, DSTM systems resort to data distribution and full replication techniques in order to provide scalability and fault tolerance.
Nevertheless, distribution does not provide support for fault tolerance and full
replication limits the system’s total storage capacity. In this context, partial data replication rises as an intermediate solution that combines the best of the previous two trying to mitigate their disadvantages. This strategy has been explored by the distributed databases research field, but has been little addressed in the context of transactional memory and, to the best of our knowledge, it has never before been incorporated into a DSTM system for a general-purpose programming language. Thus, we defend the claim that it is possible to combine both full and partial data replication in such systems.
Accordingly, we developed a prototype of a DSTM system combining full and partial data replication for Java programs. We built from an existent DSTM framework and extended it with support for partial data replication. With the proposed framework, we implemented a partially replicated DSTM.
We evaluated the proposed system using known benchmarks, and the evaluation showcases the existence of scenarios where partial data replication can be advantageous, e.g., in scenarios with small amounts of transactions modifying fully replicated data.
The results of this thesis show that we were able to sustain our claim by implementing
a prototype that effectively combines full and partial data replication in a DSTM system.
The modularity of the presented framework allows the easy implementation of its various
components, and it provides a non-intrusive interface to applications.Fundação para a Ciência e Tecnologia - (FCT/MCTES) in the scope of the research project PTDC/EIA-EIA/113613/2009 (Synergy-VM
- …