27 research outputs found
Improving transaction abort rates without compromising throughput through judicious scheduling
Althought optimistic concurrency control protocols have increasingly been used in distributed database management systems, they imply a trade-of between the number of transactions that can be executed concurrently, hence, the peak throughput, and transactions aborted due to conflicts.
We propose a novel optimistic concurrency control mechanism that controls transaction abort rate by minimizing the time during which transactions are vulnerable to abort, without compromising throughput. Briefly, we throttle transaction execution with an adaptive mechanism based on the state of the transaction queues while allowing out-of-order execution based on expected transaction latency. Prelimi- nary evaluation shows that this provides a substantial improvement in committed transaction throughput.(undefined
High Performance Transaction Processing on Non-Uniform Hardware Topologies
Transaction processing is a mission critical enterprise application that runs on high-end servers. Traditionally, transaction processing systems have been designed for uniform core-to-core communication latencies. In the past decade, with the emergence of multisocket multicores, for the first time we have Islands, i.e., groups of cores that communicate fast among themselves and slower with other groups. In current mainstream servers, each multicore processor corresponds to an Island. As the number of cores on a chip increases, however, we expect that multiple Islands will form within a single processor in the nearby future. In addition, the access latencies to the local memory and to the memory of another server over fast interconnect are converging, thus creating a hierarchy of Islands within a group of servers. Non-uniform hardware topologies pose a significant challenge to the scalability and the predictability of performance of transaction processing systems. Distributed transaction processing systems can alleviate this problem; however, no single deployment configuration is optimal for all workloads and hardware topologies. In order to fully utilize the available processing power, a transaction processing system needs to adapt to the underlying hardware topology and tune its configuration to the current workload. More specifically, the system should be able to detect any changes to the workload and hardware topology, and adapt accordingly without disrupting the processing. In this thesis, we first systematically quantify the impact of hardware Islands on deployment configurations of distributed transaction processing systems. We show that none of these configurations is optimal for all workloads, and the choice of the optimal configuration depends on the combination of the workload and hardware topology. In the cluster setting, on the other hand, the choice of optimal configuration additionally depends on the properties of the communication channel between the servers. We address this challenge by designing a dynamic shared-everything system that adapts its data structures automatically to hardware Islands. To ensure good performance in the presence of shifting workload patterns, we use a lightweight partitioning and placement mechanism to balance the load and minimize the synchronization overheads across Islands. Overall, we show that masking the non-uniformity of inter-core communication is critical for achieving predictably high performance for latency-sensitive applications, such as transaction processing. With clusters of a handful of multicore chips with large main memories replacing high-end many-socket servers, the deployment rules of thumb identified in our analysis have a potential to significantly reduce the synchronization and communication costs of transaction processing. As workloads become more dynamic and diverse, while still running on partitioned infrastructure, the lightweight monitoring and adaptive repartitioning mechanisms proposed in this thesis will be applicable to a wide range of designs for which traditional offline schemes are impractical
Practical database replication
Tese de doutoramento em InformáticaSoftware-based replication is a cost-effective approach for fault-tolerance when combined with
commodity hardware. In particular, shared-nothing database clusters built upon commodity machines
and synchronized through eager software-based replication protocols have been driven by
the distributed systems community in the last decade.
The efforts on eager database replication, however, stem from the late 1970s with initial
proposals designed by the database community. From that time, we have the distributed locking
and atomic commitment protocols. Briefly speaking, before updating a data item, all copies
are locked through a distributed lock, and upon commit, an atomic commitment protocol is
responsible for guaranteeing that the transaction’s changes are written to a non-volatile storage
at all replicas before committing it. Both these processes contributed to a poor performance.
The distributed systems community improved these processes by reducing the number of interactions
among replicas through the use of group communication and by relaxing the durability
requirements imposed by the atomic commitment protocol. The approach requires at most two
interactions among replicas and disseminates updates without necessarily applying them before
committing a transaction. This relies on a high number of machines to reduce the likelihood of
failures and ensure data resilience. Clearly, the availability of commodity machines and their
increasing processing power makes this feasible.
Proving the feasibility of this approach requires us to build several prototypes and evaluate
them with different workloads and scenarios. Although simulation environments are a good starting
point, mainly those that allow us to combine real (e.g., replication protocols, group communication)
and simulated-code (e.g., database, network), full-fledged implementations should be
developed and tested. Unfortunately, database vendors usually do not provide native support for
the development of third-party replication protocols, thus forcing protocol developers to either
change the database engines, when the source code is available, or construct in the middleware
server wrappers that intercept client requests otherwise. The former solution is hard to maintain
as new database releases are constantly being produced, whereas the latter represents a strenuous
development effort as it requires us to rebuild several database features at the middleware.
Unfortunately, the group-based replication protocols, optimistic or conservative, that had
been proposed so far have drawbacks that present a major hurdle to their practicability. The
optimistic protocols make it difficult to commit transactions in the presence of hot-spots, whereas
the conservative protocols have a poor performance due to concurrency issues.
In this thesis, we propose using a generic architecture and programming interface, titled
GAPI, to facilitate the development of different replication strategies. The idea consists of providing key extensions to multiple DBMSs (Database Management Systems), thus enabling a
replication strategy to be developed once and tested on several databases that have such extensions,
i.e., those that are replication-friendly. To tackle the aforementioned problems in groupbased
replication protocols, we propose using a novel protocol, titled AKARA. AKARA guarantees
fairness, and thus all transactions have a chance to commit, and ensures great performance
while exploiting parallelism as provided by local database engines. Finally, we outline a simple
but comprehensive set of components to build group-based replication protocols and discuss key
points in its design and implementation.A replicação baseada em software Ă© uma abordagem que fornece um bom custo benefĂcio para
tolerância a falhas quando combinada com hardware commodity. Em particular, os clusters de
base de dados “shared-nothing” construĂdos com hardware commodity e sincronizados atravĂ©s de
protocolos “eager” tĂŞm sido impulsionados pela comunidade de sistemas distribuĂdos na Ăşltima
década.
Os primeiros esforços na utilização dos protocolos “eager”, decorrem da década de 70 do
século XX com as propostas da comunidade de base de dados. Dessa época, temos os protocolos
de bloqueio distribuĂdo e de terminação atĂłmica (i.e. “two-phase commit”). De forma sucinta,
antes de actualizar um item de dados, todas as cópias são bloqueadas através de um protocolo
de bloqueio distribuĂdo e, no momento de efetivar uma transacção, um protocolo de terminação
atómica é responsável por garantir que as alterações da transacção são gravadas em todas as
réplicas num sistema de armazenamento não-volátil. No entanto, ambos os processos contribuem
para um mau desempenho do sistema.
A comunidade de sistemas distribuĂdos melhorou esses processos, reduzindo o nĂşmero de
interacções entre réplicas, através do uso da comunicação em grupo e minimizando a rigidez
os requisitos de durabilidade impostos pelo protocolo de terminação atómica. Essa abordagem
requer no máximo duas interacções entre as réplicas e dissemina actualizações sem necessariamente
aplicá-las antes de efectivar uma transacção. Para funcionar, a solução depende de um
elevado número de máquinas para reduzirem a probabilidade de falhas e garantir a resiliência de
dados. Claramente, a disponibilidade de hardware commodity e o seu poder de processamento
crescente tornam essa abordagem possĂvel.
Comprovar a viabilidade desta abordagem obriga-nos a construir vários protótipos e a avaliálos
com diferentes cargas de trabalho e cenários. Embora os ambientes de simulação sejam um
bom ponto de partida, principalmente aqueles que nos permitem combinar o cĂłdigo real (por
exemplo, protocolos de replicação, a comunicação em grupo) e o simulado (por exemplo, base
de dados, rede), implementações reais devem ser desenvolvidas e testadas. Infelizmente, os
fornecedores de base de dados, geralmente, nĂŁo possuem suporte nativo para o desenvolvimento
de protocolos de replicação de terceiros, forçando os desenvolvedores de protocolo a mudar o
motor de base de dados, quando o cĂłdigo fonte está disponĂvel, ou a construir no middleware
abordagens que interceptam as solicitações do cliente. A primeira solução Ă© difĂcil de manter já
que novas “releases” das bases de dados estão constantemente a serem produzidas, enquanto a
segunda representa um desenvolvimento árduo, pois obriga-nos a reconstruir vários recursos de
uma base de dados no middleware. Infelizmente, os protocolos de replicação baseados em comunicação em grupo, optimistas ou
conservadores, que foram propostos até agora apresentam inconvenientes que são um grande obstáculo
Ă sua utilização. Com os protocolos optimistas Ă© difĂcil efectivar transacções na presença
de “hot-spots”, enquanto que os protocolos conservadores têm um fraco desempenho devido a
problemas de concorrĂŞncia.
Nesta tese, propomos utilizar uma arquitetura genérica e uma interface de programação, intitulada
GAPI, para facilitar o desenvolvimento de diferentes estratégias de replicação. A ideia
consiste em fornecer extensões chaves para múltiplos SGBDs (Database Management Systems),
permitindo assim que uma estratégia de replicação possa ser desenvolvida uma única vez e testada
em várias bases de dados que possuam tais extensões, ou seja, aquelas que são “replicationfriendly”.
Para resolver os problemas acima referidos nos protocolos de replicação baseados
em comunicação em grupo, propomos utilizar um novo protocolo, intitulado AKARA. AKARA
garante a equidade, portanto, todas as operações têm uma oportunidade de serem efectivadas,
e garante um excelente desempenho ao tirar partido do paralelismo fornecido pelos motores
de base de dados. Finalmente, propomos um conjunto simples, mas abrangente de componentes
para construir protocolos de replicação baseados em comunicação em grupo e discutimos pontoschave
na sua concepção e implementação
ADDING PERSISTENCE TO MAIN MEMORY PROGRAMMING
Unlocking the true potential of the new persistent memories (PMEMs) requires eliminating
traditional persistent I/O abstractions altogether, by introducing persistent semantics directly into
main memory programming. Such a programming model elevates failure atomicity to a first-class
application property in addition to in-memory data layout, concurrency-control, and fault tolerance, and therefore requires redesign of programming abstractions for both program correctness and maximum performance gains. To address these challenges, this thesis proposes a set of system software designs that integrate persistence with main memory programming, and makes the following contributions.
First, this thesis proposes a PMEM-aware I/O runtime, NVStream, that supports fast durable
streaming I/O. NVStream uses a memory-based I/O interface that integrates with existing I/O data movement operations of an application to accelerate persistent data writes. NVStream carefully designs its persistent data storage layout and crash-consistent semantics to match both application and PMEM characteristics. Specifically, we leverage the streaming nature of I/O in HPC workflows, to benefit from using a log-structured PMEM storage engine design, that uses relaxed write orderings and append-only failure-atomic semantics to form strongly consistent application checkpoints. Furthermore, we identify that optimizing the I/O software stack exposes the PMEM bandwidth limitations as a bottleneck during parallel HPC I/O writes, and propose a novel data movement design – PHX. PHX uses alternative network data movement paths available in datacenters to ease up the bandwidth pressure on the PMEM memory interconnects, all while maintaining the correctness of the persistent data.
Next, the thesis explores the challenges and opportunities of using PMEM for true main memory persistent programming – a single data domain for both runtime and persistent applicationstate. Such a programming model includes maintaining ACID properties during each and every update to applications persistent structures. ACID-qualified persistent programming for multi-threaded applications is hard, as the programmer has to reason about both crash-consistency and
synchronization – crash-sync – semantics for programming correctness. The thesis contributes new understanding of the correctness requirements for mixing different crash-consistent and synchronization protocols, characterizes the performance of different crash-sync realizations for different applications and hardware architectures, and draws actionable insights for future designs of PMEM systems.
Finally, the application state stored on node-local persistent memory is still vulnerable to catastrophic node failures. The thesis proposes a replicated persistent memory runtime, Blizzard, that supports truly fault tolerant, concurrent and persistent data-structure programming. Blizzard carefully integrates userspace networking with byte addressable PMEM for a fast, persistent memory replication runtime. The design also incorporates a replication-aware crash-sync protocol that supports consistent and concurrent updates on persistent data-structures. Blizzard offers applications the flexibility to use the data structures that best match their functional requirements, while offering better performance, and providing crucial reliability guarantees lacking from existing persistent memory runtimes.Ph.D
Extending functional databases for use in text-intensive applications
This thesis continues research exploring the benefits of using functional
databases based around the functional data model for advanced database
applications-particularly those supporting investigative systems. This is a
growing generic application domain covering areas such as criminal and military
intelligence, which are characterised by significant data complexity, large data
sets and the need for high performance, interactive use. An experimental
functional database language was developed to provide the requisite semantic
richness. However, heavy use in a practical context has shown that language
extensions and implementation improvements are required-especially in the
crucial areas of string matching and graph traversal. In addition, an
implementation on multiprocessor, parallel architectures is essential to meet the
performance needs arising from existing and projected database sizes in the
chosen application area. [Continues.
NASA space station automation: AI-based technology review
Research and Development projects in automation for the Space Station are discussed. Artificial Intelligence (AI) based automation technologies are planned to enhance crew safety through reduced need for EVA, increase crew productivity through the reduction of routine operations, increase space station autonomy, and augment space station capability through the use of teleoperation and robotics. AI technology will also be developed for the servicing of satellites at the Space Station, system monitoring and diagnosis, space manufacturing, and the assembly of large space structures
Just-in-time Analytics Over Heterogeneous Data and Hardware
Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of datasets to gain insights. At the same time, data variety increases continuously across multiple axes. First, data comes in multiple formats, such as the binary tabular data of a DBMS, raw textual files, and domain-specific formats. Second, different datasets follow different data models, such as the relational and the hierarchical one. Data location also varies: Some datasets reside in a central "data lake", whereas others lie in remote data sources. In addition, users execute widely different analysis tasks over all these data types. Finally, the process of gathering and integrating diverse datasets introduces several inconsistencies and redundancies in the data, such as duplicate entries for the same real-world concept. In summary, heterogeneity significantly affects the way data analysis is performed. In this thesis, we aim for data virtualization: Abstracting data out of its original form and manipulating it regardless of the way it is stored or structured, without a performance penalty. To achieve data virtualization, we design and implement systems that i) mask heterogeneity through the use of heterogeneity-aware, high-level building blocks and ii) offer fast responses through on-demand adaptation techniques. Regarding the high-level building blocks, we use a query language and algebra to handle multiple collection types, such as relations and hierarchies, express transformations between these collection types, as well as express complex data cleaning tasks over them. In addition, we design a location-aware compiler and optimizer that masks away the complexity of accessing multiple remote data sources. Regarding on-demand adaptation, we present a design to produce a new system per query. The design uses customization mechanisms that trigger runtime code generation to mimic the system most appropriate to answer a query fast: Query operators are thus created based on the query workload and the underlying data models; the data access layer is created based on the underlying data formats. In addition, we exploit emerging hardware by customizing the system implementation based on the available heterogeneous processors â CPUs and GPGPUs. We thus pair each workload with its ideal processor type. The end result is a just-in-time database system that is specific to the query, data, workload, and hardware instance. This thesis redesigns the data management stack to natively cater for data heterogeneity and exploit hardware heterogeneity. Instead of centralizing all relevant datasets, converting them to a single representation, and loading them in a monolithic, static, suboptimal system, our design embraces heterogeneity. Overall, our design decouples the type of performed analysis from the original data layout; users can perform their analysis across data stores, data models, and data formats, but at the same time experience the performance offered by a custom system that has been built on demand to serve their specific use case
The Sixth Annual Workshop on Space Operations Applications and Research (SOAR 1992)
This document contains papers presented at the Space Operations, Applications, and Research Symposium (SOAR) hosted by the U.S. Air Force (USAF) on 4-6 Aug. 1992 and held at the JSC Gilruth Recreation Center. The symposium was cosponsored by the Air Force Material Command and by NASA/JSC. Key technical areas covered during the symposium were robotic and telepresence, automation and intelligent systems, human factors, life sciences, and space maintenance and servicing. The SOAR differed from most other conferences in that it was concerned with Government-sponsored research and development relevant to aerospace operations. The symposium's proceedings include papers covering various disciplines presented by experts from NASA, the USAF, universities, and industry
Control over the Cloud : Offloading, Elastic Computing, and Predictive Control
The thesis studies the use of cloud native software and platforms to implement critical closed loop control. It considers technologies that provide low latency and reliable wireless communication, in terms of edge clouds and massive MIMO, but also approaches industrial IoT and the services of a distributed cloud, as an extension of commercial-of-the-shelf software and systems.First, the thesis defines the cloud control challenge, as control over the cloud and controller offloading. This is followed by a demonstration of closed loop control, using MPC, running on a testbed representing the distributed cloud.The testbed is implemented using an IoT device, clouds, next generation wireless technology, and a distributed execution platform. Platform details are provided and feasibility of the approach is shown. Evaluation includes relocating an on-line MPC to various locations in the distributed cloud. Offloaded control is examined next, through further evaluation of cloud native software and frameworks. This is followed by three controller designs, tailored for use with the cloud. The first controller solves MPC problems in parallel, to implement a variable horizon controller. The second is a hierarchical design, in which rate switching is used to implement constrained control, with a local and a remote mode. The third design focuses on reliability. Here, the MPC problem is extended to include recovery paths that represent a fallback mode. This is used by a control client if it experiences connectivity issues.An implementation is detailed and examined.In the final part of the thesis, the focus is on latency and congestion. A cloud control client can experience long and variable delays, from network and computations, and used services can become overloaded. These problems are approached by using predicted control inputs, dynamically adjusting the control frequency, and using horizontal scaling of the cloud service. Several examples are shown through simulation and on real clouds, including admitting control clients into a cluster that becomes temporarily overloaded