72 research outputs found
Recommended from our members
Distributed simulation and the grid: Position statements
The Grid provides a new and unrivaled technology for large scale distributed simulation as it enables collaboration and the use of distributed computing resources. This panel paper presents the views of four researchers in the area of Distributed Simulation and the Grid. Together we try to identify the main research issues involved in applying Grid technology to distributed simulation and the key future challenges that need to be solved to achieve this goal. Such challenges include not only technical challenges, but also political ones such as management methodology for the Grid and the development of standards. The benefits of the Grid to end-user simulation modelers also are discussed
Cross-layer design through joint routing and link allocation in wireless sensor networks
Both energy and bandwidth are scarce resources in sensor networks. In the past, the energy efficient routing problem has been extensively studied in efforts to maximize sensor network lifetimes, but the link bandwidth has been optimistically assumed to be abundant. Because energy constraint affects how data should be routed, link bandwidth affects not only the routing topology, but also the allowed data rate on each link, which in turn affects the lifetime. Previous research that focus on energy efficient operations in sensor networks with the sole objective of maximizing network lifetime only consider the energy constraint ignoring the bandwidth constraint. This thesis shows how infeasible these solutions can be when bandwidth does present a constraint. It provides a new mathematical model that address both energy and bandwidth constraints and proposes two efficient heuristics for routing and rate allocation. Simulation results show that these heuristics provide more feasible routing solutions than previous work, and significantly improve throughput. A method of assigning the time slot based on the given link rates is presented. The cross layer design approach improves channel utility significantly and completely solves the hidden terminal and exposed terminal problems --Abstract, page iii
Partial replication in the database state machine
Tese de Doutoramento em Informática - Ramo do Conhecimento em Tecnologias da ProgramaçãoEnterprise information systems are nowadays commonly structured as multi-tier
architectures and invariably built on top of database management systems responsible
for the storage and provision of the entire business data. Database management
systems therefore play a vital role in today’s organizations, from their reliability
and availability directly depends the overall system dependability.
Replication is a well known technique to improve dependability. By maintaining
consistent replicas of a database one can increase its fault tolerance and simultaneously
improve system’s performance by splitting the workload among the
replicas.
In this thesis we address these issues by exploiting the partial replication of databases.
We target large scale systems where replicas are distributed across wide
area networks aiming at both fault tolerance and fast local access to data. In particular,
we envision information systems of multinational organizations presenting
strong access locality in which fully replicated data should be kept to a minimum
and a judicious placement of replicas should be able to allow the full recovery of
any site in case of failure.
Our research departs from work on database replication algorithms based on group
communication protocols, in detail, multi-master certification-based protocols. At
the core of these protocols resides a total order multicast primitive responsible for
establishing a total order of transaction execution.
A well known performance optimization in local area networks exploits the fact
that often the definitive total order of messages closely following the spontaneous
network order, thus making it possible to optimistically proceed in parallel with
the ordering protocol. Unfortunately, this optimization is invalidated in wide area
networks, precisely when the increased latency would make it more useful. To
overcome this we present a novel total order protocol with optimistic delivery for
wide area networks. Our protocol uses local statistic estimates to independently
order messages closely matching the definitive one thus allowing optimistic execution
in real wide area networks.
Handling partial replication within a certification based protocol is also particularly
challenging as it directly impacts the certification procedure itself. Depending
on the approach, the added complexity may actually defeat the purpose
of partial replication. We devise, implement and evaluate two variations of the
Database State Machine protocol discussing their benefits and adequacy with the
workload of the standard TPC-C benchmark.Os sistemas de informação empresariais actuais estruturam-se normalmente em
arquitecturas de software multi-nível, e apoiam-se invariavelmente sobre um sistema
de gestão de bases de dados para o armazenamento e aprovisionamento de
todos os dados do negócio. A base de dado desempenha assim um papel vital,
sendo a confiabilidade do sistema directamente dependente da sua fiabilidade e
disponibilidade.
A replicação é uma das formas de melhorar a confiabilidade. Garantindo a coerência
de um conjunto de réplicas da base de dados, é possível aumentar simultaneamente
a sua tolerância a faltas e o seu desempenho, ao distribuir as tarefas a
realizar pelas várias réplicas não sobrecarregando apenas uma delas.
Nesta tese, propomos soluções para estes problemas utilizando a replicação parcial
das bases de dados. Nos sistemas considerados, as réplicas encontram-se
distribuídas numa rede de larga escala, almejando-se simultaneamente obter tolerância
a faltas e garantir um acesso local rápido aos dados. Os sistemas propostos
têm como objectivo adequarem-se às exigências dos sistemas de informação de
multinacionais em que em cada réplica existe uma elevada localidade dos dados
acedidos. Nestes sistemas, os dados replicados em todas as réplicas devem ser
apenas os absolutamente indispensáveis, e a selecção criteriosa dos dados a colocar
em cada réplica, deve permitir em caso de falha a reconstrução completa da
base de dados.
Esta investigação tem como ponto de partida os protocolos de replicação de bases
de dados utilizando comunicação em grupo, em particular os baseados em certificação
e execução optimista por parte de qualquer uma das réplicas. O mecanismo
fundamental deste tipo de protocolos de replicação é a primitiva de difusão
com garantia de ordem total, utilizada para definir a ordem de execução das
transacções.
Uma optimização normalmente utilizada pelos protocolos de ordenação total é a
utilização da ordenação espontânea da rede como indicador da ordem das mensagens,
e usar esta ordem espontânea para processar de forma optimista as mensagens
em paralelo com a sua ordenação. Infelizmente, em redes de larga escala
a espontaneidade de rede é praticamente residual, inviabilizando a utilização
desta optimização precisamente no cenário em que a sua utilização seria mais
vantajosa. Para contrariar esta adversidade propomos um novo protocolo de ordenação
total com entrega optimista para redes de larga escala. Este protocolo
utiliza informação estatística local a cada processo para "produzir" uma ordem
espontânea muito mais coincidente com a ordem total obtida viabilizando a utilização
deste tipo de optimizações em redes de larga escala. Permitir que protocolos de replicação de bases de dados baseados em certificação
suportem replicação parcial coloca vários desafios que afectam directamente a
forma com é executado o procedimento de certificação. Dependendo da abordagem
à replicação parcial, a complexidade gerada pode até comprometer os
propósitos da replicação parcial. Esta tese concebe, implementa e avalia duas variantes
do protocolo da database state machine com suporte para replicação parcial,
analisando os benefícios e adequação da replicação parcial ao teste padronizado
de desempenho de bases de dados, o TPC-C.Fundação para a Ciência e a Tecnologia (FCT) - ESCADA (POSI/CHS/33792/2000)
Asynchronous Validity Resolution in Sequentially Consistent Shared Virtual Memory
Shared Virtual Memory (SVM) is an effort to provide a mechanism for a distributed system, such as a cluster, to execute shared memory parallel programs. Unfortunately, SVM has performance problems due to its underlying distributed architecture. Recent developments have increased performance of SVM by reducing communication. Unfortunately this performance gain was only possible by increasing programming complexity and by restricting the types of programs allowed to execute in the system. Validity resolution is the process of resolving the validity of a memory object such as a page. Current SVM systems use synchronous or deferred validity resolution techniques in which user processing is blocked during the validity resolution process. This is the case even when resolving validity of false shared variables. False-sharing occurs when two or more processes access unrelated variables stored within the same shared block of memory and at least one of the processes is writing. False sharing unnecessarily reduces overall performance of SVM systems?because user processing is blocked during validity resolution although no actual data dependencies exist. This thesis presents Asynchronous Validity Resolution (AVR), a new approach to SVM which reduces the performance losses associated with false sharing while maintaining the ease of programming found with regular shared memory parallel programming methodology. Asynchronous validity resolution allows concurrent user process execution and data validity resolution. AVR is evaluated by com-paring performance of an application suite using both an AVR sequentially con-sistent SVM system and a traditional sequentially consistent (SC) SVM system. The results show that AVR can increase performance over traditional sequentially consistent SVM for programs which exhibit false sharing. Although AVR outperforms regular SC by as much as 26%, performance of AVR is dependent on the number of false-sharing vs. true-sharing accesses, the number of pages in the program’s working set, the amount of user computation that completes per page request, and the internodal round-trip message time in the system. Overall, the results show that AVR could be an important member of the arsenal of tools available to parallel programmers
A pragmatic protocol for database replication in interconnected clusters
Multi-master update everywhere database replication, as achieved by protocols based on group communication such as DBSM and Postgres-R, addresses both performance and availability. By scaling it to wide area networks, one could save costly bandwidth and avoid large round-trips to a distant master server. Also, by ensuring that updates are safely stored at a remote site within transaction boundaries, disaster recovery is guaranteed. Unfortunately, scaling existing cluster based replication protocols is troublesome. In this paper we present a database replication protocol based on group communication that targets interconnected clusters. In contrast with previous proposals, it uses a separate multicast group for each cluster and thus does not impose any additional requirements on group communication, easing implementation and deployment in a real setting. Nonetheless, the protocol ensures one-copy equivalence while allowing all sites to execute update transactions. Experimental evaluation using the workload of the industry standard TPC-C benchmark confirms the advantages of the approach
Practical database replication
Tese de doutoramento em InformáticaSoftware-based replication is a cost-effective approach for fault-tolerance when combined with
commodity hardware. In particular, shared-nothing database clusters built upon commodity machines
and synchronized through eager software-based replication protocols have been driven by
the distributed systems community in the last decade.
The efforts on eager database replication, however, stem from the late 1970s with initial
proposals designed by the database community. From that time, we have the distributed locking
and atomic commitment protocols. Briefly speaking, before updating a data item, all copies
are locked through a distributed lock, and upon commit, an atomic commitment protocol is
responsible for guaranteeing that the transaction’s changes are written to a non-volatile storage
at all replicas before committing it. Both these processes contributed to a poor performance.
The distributed systems community improved these processes by reducing the number of interactions
among replicas through the use of group communication and by relaxing the durability
requirements imposed by the atomic commitment protocol. The approach requires at most two
interactions among replicas and disseminates updates without necessarily applying them before
committing a transaction. This relies on a high number of machines to reduce the likelihood of
failures and ensure data resilience. Clearly, the availability of commodity machines and their
increasing processing power makes this feasible.
Proving the feasibility of this approach requires us to build several prototypes and evaluate
them with different workloads and scenarios. Although simulation environments are a good starting
point, mainly those that allow us to combine real (e.g., replication protocols, group communication)
and simulated-code (e.g., database, network), full-fledged implementations should be
developed and tested. Unfortunately, database vendors usually do not provide native support for
the development of third-party replication protocols, thus forcing protocol developers to either
change the database engines, when the source code is available, or construct in the middleware
server wrappers that intercept client requests otherwise. The former solution is hard to maintain
as new database releases are constantly being produced, whereas the latter represents a strenuous
development effort as it requires us to rebuild several database features at the middleware.
Unfortunately, the group-based replication protocols, optimistic or conservative, that had
been proposed so far have drawbacks that present a major hurdle to their practicability. The
optimistic protocols make it difficult to commit transactions in the presence of hot-spots, whereas
the conservative protocols have a poor performance due to concurrency issues.
In this thesis, we propose using a generic architecture and programming interface, titled
GAPI, to facilitate the development of different replication strategies. The idea consists of providing key extensions to multiple DBMSs (Database Management Systems), thus enabling a
replication strategy to be developed once and tested on several databases that have such extensions,
i.e., those that are replication-friendly. To tackle the aforementioned problems in groupbased
replication protocols, we propose using a novel protocol, titled AKARA. AKARA guarantees
fairness, and thus all transactions have a chance to commit, and ensures great performance
while exploiting parallelism as provided by local database engines. Finally, we outline a simple
but comprehensive set of components to build group-based replication protocols and discuss key
points in its design and implementation.A replicação baseada em software é uma abordagem que fornece um bom custo benefício para
tolerância a falhas quando combinada com hardware commodity. Em particular, os clusters de
base de dados “shared-nothing” construídos com hardware commodity e sincronizados através de
protocolos “eager” têm sido impulsionados pela comunidade de sistemas distribuídos na última
década.
Os primeiros esforços na utilização dos protocolos “eager”, decorrem da década de 70 do
século XX com as propostas da comunidade de base de dados. Dessa época, temos os protocolos
de bloqueio distribuído e de terminação atómica (i.e. “two-phase commit”). De forma sucinta,
antes de actualizar um item de dados, todas as cópias são bloqueadas através de um protocolo
de bloqueio distribuído e, no momento de efetivar uma transacção, um protocolo de terminação
atómica é responsável por garantir que as alterações da transacção são gravadas em todas as
réplicas num sistema de armazenamento não-volátil. No entanto, ambos os processos contribuem
para um mau desempenho do sistema.
A comunidade de sistemas distribuídos melhorou esses processos, reduzindo o número de
interacções entre réplicas, através do uso da comunicação em grupo e minimizando a rigidez
os requisitos de durabilidade impostos pelo protocolo de terminação atómica. Essa abordagem
requer no máximo duas interacções entre as réplicas e dissemina actualizações sem necessariamente
aplicá-las antes de efectivar uma transacção. Para funcionar, a solução depende de um
elevado número de máquinas para reduzirem a probabilidade de falhas e garantir a resiliência de
dados. Claramente, a disponibilidade de hardware commodity e o seu poder de processamento
crescente tornam essa abordagem possível.
Comprovar a viabilidade desta abordagem obriga-nos a construir vários protótipos e a avaliálos
com diferentes cargas de trabalho e cenários. Embora os ambientes de simulação sejam um
bom ponto de partida, principalmente aqueles que nos permitem combinar o código real (por
exemplo, protocolos de replicação, a comunicação em grupo) e o simulado (por exemplo, base
de dados, rede), implementações reais devem ser desenvolvidas e testadas. Infelizmente, os
fornecedores de base de dados, geralmente, não possuem suporte nativo para o desenvolvimento
de protocolos de replicação de terceiros, forçando os desenvolvedores de protocolo a mudar o
motor de base de dados, quando o código fonte está disponível, ou a construir no middleware
abordagens que interceptam as solicitações do cliente. A primeira solução é difícil de manter já
que novas “releases” das bases de dados estão constantemente a serem produzidas, enquanto a
segunda representa um desenvolvimento árduo, pois obriga-nos a reconstruir vários recursos de
uma base de dados no middleware. Infelizmente, os protocolos de replicação baseados em comunicação em grupo, optimistas ou
conservadores, que foram propostos até agora apresentam inconvenientes que são um grande obstáculo
à sua utilização. Com os protocolos optimistas é difícil efectivar transacções na presença
de “hot-spots”, enquanto que os protocolos conservadores têm um fraco desempenho devido a
problemas de concorrência.
Nesta tese, propomos utilizar uma arquitetura genérica e uma interface de programação, intitulada
GAPI, para facilitar o desenvolvimento de diferentes estratégias de replicação. A ideia
consiste em fornecer extensões chaves para múltiplos SGBDs (Database Management Systems),
permitindo assim que uma estratégia de replicação possa ser desenvolvida uma única vez e testada
em várias bases de dados que possuam tais extensões, ou seja, aquelas que são “replicationfriendly”.
Para resolver os problemas acima referidos nos protocolos de replicação baseados
em comunicação em grupo, propomos utilizar um novo protocolo, intitulado AKARA. AKARA
garante a equidade, portanto, todas as operações têm uma oportunidade de serem efectivadas,
e garante um excelente desempenho ao tirar partido do paralelismo fornecido pelos motores
de base de dados. Finalmente, propomos um conjunto simples, mas abrangente de componentes
para construir protocolos de replicação baseados em comunicação em grupo e discutimos pontoschave
na sua concepção e implementação
Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks
The success of modern applications depends on the insights they collect from their data repositories. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size, as they collect data from varied sources - web applications, mobile phones, sensors and other connected devices. Distributed storage and data-centric compute frameworks have been invented to store and analyze these large datasets. This dissertation focuses on extending the applicability and improving the efficiency of distributed data-centric compute frameworks
- …