10,655 research outputs found
A categorization scheme for concurrency control protocols in distributed databases
The problem of concurrency control in distributed databases is very complex. As a result, a great number of control algorithms have been proposed in recent years. This research is aimed at the development of a viable categorization scheme for these various algorithms. The scheme is based on the theoretical concept of serializability, but is qualitative in nature. An important class of serializable execution sequences, conflict-preserving-serializable, leads to the identification of fundamental attributes common to all algorithms included in this study. These attributes serve as the underlying philosophy for the categorization scheme. Combined with the two logical approaches of prevention and correction of nonserializability, the result is a flexible and extensive categorization scheme which accounts for all algorithms studied and suggests the possibility of new algorithms
Partial replication with strong consistency
In response to the increasing expectations of their clients, cloud services exploit
geo-replication to provide fault-tolerance, availability and low latency when executing
requests. However, cloud platforms tend to adopt weak consistency semantics, in which
replicas may diverge in state independently. These systems offer good response times
but at the disadvantage of allowing potential data inconsistencies that may affect user
experience.
Some systems propose to adopt solutions with strong consistency, which are not as
efficient but simplify the development of correct applications by guaranteeing that all
replicas in the system maintain the same database state. Therefore, it is interesting to explore
a system that can offer strong consistency while minimizing its main disadvantage:
the impact in performance that results from coordinating every replica in the system. A
possible solution to reduce the cost of replica coordination is to support partial replication.
Partially replicating a database allows for each server to only be responsible for a
subset of the data - a partition - which means that when updating the database only some
of replicas have to be synchronized, improving response times.
In this dissertation, we propose an algorithm that implements a distributed replicated
database that offers strong consistency with support for partial replication. To achieve
strong consistency in a partially replicated scenario, our algorithm is in part based on the
Clock-SI[10] research, which presents an algorithm that implements a multi-versioned
database for strong consistency (snapshot-isolation) and performs the Two-Phase Commit
protocol when coordinating replicas during updates. The algorithm is supported by
an architecture that simplifies distributing partitions among datacenters and efficiently
propagating operations across nodes in the same partition, thanks to the ChainPaxos[27]
algorithm.Como forma de responder às expectativas cada vez maiores dos seus clientes, as
operadoras cloud tiram partido da geo-replicação para oferecer tolerância a falhas, disponibilidade
e baixa latência dos seus sistemas na resposta aos pedidos. No entanto, as
plataformas cloud tendem a adotar uma semântica de consistência fraca, na qual as réplicas
podem variar em estado de forma independente. Estes sistemas oferecem bons tempos
de resposta mas com a desvantagem de que têm de lidar com potenciais inconsistências
nos dados que podem ter impacto na experiência dos utilizadores.
Alguns sistemas propõem adotar soluções com consistência forte, as quais não são
tão eficientes mas simplificam o desenvolvimento de aplicações ao garantir que todas
as réplicas do sistema mantêm o mesmo estado da base de dados. É então interessante
explorar um sistema que garanta replicação forte mas que minimize a sua principal
desvantagem: o impacto de performance no momento de coordenar o estado das réplicas
nos sistema. Uma possÃvel solução para reduzir o custo de coordenação das réplicas
durante transações é o suporte à replicação parcial. Replicar parcialmente uma base de
dados permite que cada servidor seja apenas responsável por uma parte dos dados - uma
partição - o que significa que quando são realizadas escritas apenas algumas das réplicas
têm de ser sincronizadas, melhorando os tempos de resposta.
Neste trabalho propomos um algoritmo que implementa um sistema de armazenamento
distrÃbuido replicado que oferece consistência forte com suporte a replicação parcial.
A fim de garantir consistência forte num cenário de replicação parcial, o nosso
algoritmo é em parte baseado no algoritmo Clock-SI[10], que implementa uma base de
dados parcial com multi-versões para garantir consistência forte (snapshot-isolation) e
que realiza o protocolo Two-Phase Commit para coordenar as réplicas no momento de
aplicar escritas. O algoritmo é suportado por uma arquitectura que torna simples distribuir
partições por vários centros de dados e propagar de forma eficiente operações entre
todos os nós numa mesma partição, através do algoritmo ChainPaxos[27]
Scalable coordination of distributed in-memory transactions
Phd ThesisCoordinating transactions involves ensuring serializability in the presence of concurrent data
accesses. Accomplishing it in an scalable manner for distributed in-memory transactions is the
aim of this thesis work. To this end, the work makes three contributions. It first experimentally
demonstrates that transaction latency and throughput scale considerably well when an atomic
multicast service is offered to transaction nodes by a crash-tolerant ensemble of dedicated nodes
and that using such a service is the most scalable approach compared to practices advocated in
the literature. Secondly, we design, implement and evaluate a crash-tolerant and non-blocking
atomic broadcast protocol, called ABcast, which is then used as the foundation for building the
aforementioned multicast service.
ABcast is a hybrid protocol, which consists of a pair of primary and backup protocols executing
in parallel. The primary protocol is a deterministic atomic broadcast protocol that provides
high performance when node crashes are absent, but blocks in their presence until a group
membership service detects such failures. The backup protocol, Aramis, is a probabilistic protocol
that does not block in the event of node crashes and allows message delivery to continue
post-crash until the primary protocol is able to resume. Aramis design avoids blocking by assuming
that message delays remain within a known bound with a high probability that can be
estimated in advance, provided that recent delay estimates are used to (i) continually adjust
that bound and (ii) regulate flow control. Aramis delivery of broadcasts preserve total order
with a probability that can be tuned to be close to 1. Comprehensive evaluations show that this
probability can be 99.99% or more.
Finally, we assess the effect of low-probability order violations on implementing various
isolation levels commonly considered in transaction systems. These three contributions together
advance the state-of-art in two major ways: (i) identifying a service based approach
to transactional scalability and (ii) establishing a practical alternative to the complex PAXOSiii
style approach to building such a service, by using novel but simple protocols and open-source
software frameworks.Red Ha
Behind the Last Line of Defense -- Surviving SoC Faults and Intrusions
Today, leveraging the enormous modular power, diversity and flexibility of
manycore systems-on-a-chip (SoCs) requires careful orchestration of complex
resources, a task left to low-level software, e.g. hypervisors. In current
architectures, this software forms a single point of failure and worthwhile
target for attacks: once compromised, adversaries gain access to all
information and full control over the platform and the environment it controls.
This paper proposes Midir, an enhanced manycore architecture, effecting a
paradigm shift from SoCs to distributed SoCs. Midir changes the way platform
resources are controlled, by retrofitting tile-based fault containment through
well known mechanisms, while securing low-overhead quorum-based consensus on
all critical operations, in particular privilege management and, thus,
management of containment domains. Allowing versatile redundancy management,
Midir promotes resilience for all software levels, including at low level. We
explain this architecture, its associated algorithms and hardware mechanisms
and show, for the example of a Byzantine fault tolerant microhypervisor, that
it outperforms the highly efficient MinBFT by one order of magnitude
METHODS FOR HIGH-THROUGHPUT COMPARATIVE GENOMICS AND DISTRIBUTED SEQUENCE ANALYSIS
High-throughput sequencing has accelerated applications of genomics throughout the world. The increased production and decentralization of sequencing has also created bottlenecks in computational analysis. In this dissertation, I provide novel computational methods to improve analysis throughput in three areas: whole genome multiple alignment, pan-genome annotation, and bioinformatics workflows.
To aid in the study of populations, tools are needed that can quickly compare multiple genome sequences, millions of nucleotides in length. I present a new multiple alignment tool for whole genomes, named Mugsy, that implements a novel method for identifying syntenic regions. Mugsy is computationally efficient, does not require a reference genome, and is robust in identifying a rich complement of genetic variation including duplications, rearrangements, and large-scale gain and loss of sequence in mixtures of draft and completed genome data. Mugsy is evaluated on the alignment of several dozen bacterial chromosomes on a single computer and was the fastest program evaluated for the alignment of assembled human chromosome sequences from four individuals. A distributed version of the algorithm is also described and provides increased processing throughput using multiple CPUs.
Numerous individual genomes are sequenced to study diversity, evolution and classify pan-genomes. Pan-genome annotations contain inconsistencies and errors that hinder comparative analysis, even within a single species. I introduce a new tool, Mugsy-Annotator, that identifies orthologs and anomalous gene structure across a pan-genome using whole genome multiple alignments. Identified anomalies include inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of pan-genomes indicates that such anomalies are common and alternative annotations suggested by the tool can improve annotation consistency and quality.
Finally, I describe the Cloud Virtual Resource, CloVR, a desktop application for automated sequence analysis that improves usability and accessibility of bioinformatics software and cloud computing resources. CloVR is installed on a personal computer as a virtual machine and requires minimal installation, addressing challenges in deploying bioinformatics workflows. CloVR also seamlessly accesses remote cloud computing resources for improved processing throughput. In a case study, I demonstrate the portability and scalability of CloVR and evaluate the costs and resources for microbial sequence analysis
Behind the Last Line of Defense -- Surviving SoC Faults and Intrusions
Today, leveraging the enormous modular power, diversity and flexibility of manycore systems-on-a-chip (SoCs) requires careful orchestration of complex resources, a task left to low-level software, e.g. hypervisors. In current architectures, this software forms a single point of failure and worthwhile target for attacks: once compromised, adversaries gain access to all information and full control over the platform and the environment it controls. This paper proposes Midir, an enhanced manycore architecture, effecting a paradigm shift from SoCs to distributed SoCs. Midir changes the way platform resources are controlled, by retrofitting tile-based fault containment through well known mechanisms, while securing low-overhead quorum-based consensus on all critical operations, in particular privilege management and, thus, management of containment domains. Allowing versatile redundancy management, Midir promotes resilience for all software levels, including at low level. We explain this architecture, its associated algorithms and hardware mechanisms and show, for the example of a Byzantine fault tolerant microhypervisor, that it outperforms the highly efficient MinBFT by one order of magnitude
- …