792 research outputs found
Architectural Principles for Database Systems on Storage-Class Memory
Database systems have long been optimized to hide the higher latency of storage media, yielding complex persistence mechanisms. With the advent of large DRAM capacities, it became possible to keep a full copy of the data in DRAM. Systems that leverage this possibility, such as main-memory databases, keep two copies of the data in two different formats: one in main memory and the other one in storage. The two copies are kept synchronized using snapshotting and logging. This main-memory-centric architecture yields nearly two orders of magnitude faster analytical processing than traditional, disk-centric ones. The rise of Big Data emphasized the importance of such systems with an ever-increasing need for more main memory. However, DRAM is hitting its scalability limits: It is intrinsically hard to further increase its density.
Storage-Class Memory (SCM) is a group of novel memory technologies that promise to alleviate DRAM’s scalability limits. They combine the non-volatility, density, and economic characteristics of storage media with the byte-addressability and a latency close to that of DRAM. Therefore, SCM can serve as persistent main memory, thereby bridging the gap between main memory and storage. In this dissertation, we explore the impact of SCM as persistent main memory on database systems. Assuming a hybrid SCM-DRAM hardware architecture, we propose a novel software architecture for database systems that places primary data in SCM and directly operates on it, eliminating the need for explicit IO. This architecture yields many benefits: First, it obviates the need to reload data from storage to main memory during recovery, as data is discovered and accessed directly in SCM. Second, it allows replacing the traditional logging infrastructure by fine-grained, cheap micro-logging at data-structure level. Third, secondary data can be stored in DRAM and reconstructed during recovery. Fourth, system runtime information can be stored in SCM to improve recovery time. Finally, the system may retain and continue in-flight transactions in case of system failures.
However, SCM is no panacea as it raises unprecedented programming challenges. Given its byte-addressability and low latency, processors can access, read, modify, and persist data in SCM using load/store instructions at a CPU cache line granularity. The path from CPU registers to SCM is long and mostly volatile, including store buffers and CPU caches, leaving the programmer with little control over when data is persisted. Therefore, there is a need to enforce the order and durability of SCM writes using persistence primitives, such as cache line flushing instructions. This in turn creates new failure scenarios, such as missing or misplaced persistence primitives.
We devise several building blocks to overcome these challenges. First, we identify the programming challenges of SCM and present a sound programming model that solves them. Then, we tackle memory management, as the first required building block to build a database system, by designing a highly scalable SCM allocator, named PAllocator, that fulfills the versatile needs of database systems. Thereafter, we propose the FPTree, a highly scalable hybrid SCM-DRAM persistent B+-Tree that bridges the gap between the performance of transient and persistent B+-Trees. Using these building blocks, we realize our envisioned database architecture in SOFORT, a hybrid SCM-DRAM columnar transactional engine. We propose an SCM-optimized MVCC scheme that eliminates write-ahead logging from the critical path of transactions. Since SCM -resident data is near-instantly available upon recovery, the new recovery bottleneck is rebuilding DRAM-based data. To alleviate this bottleneck, we propose a novel recovery technique that achieves nearly instant responsiveness of the database by accepting queries right after recovering SCM -based data, while rebuilding DRAM -based data in the background. Additionally, SCM brings new failure scenarios that existing testing tools cannot detect. Hence, we propose an online testing framework that is able to automatically simulate power failures and detect missing or misplaced persistence primitives. Finally, our proposed building blocks can serve to build more complex systems, paving the way for future database systems on SCM
Transactions and data management in NoSQL cloud databases
NoSQL databases have become the preferred option for storing and processing data in cloud computing as they are capable of providing high data availability, scalability and efficiency. But in order to achieve these attributes, NoSQL databases make certain trade-offs. First, NoSQL databases cannot guarantee strong consistency of data. They only guarantee a weaker consistency which is based on eventual consistency model. Second, NoSQL databases adopt a simple data model which makes it easy for data to be scaled across multiple nodes. Third, NoSQL databases do not support table joins and referential integrity which by implication, means they cannot implement complex queries. The combination of these factors implies that NoSQL databases cannot support transactions. Motivated by these crucial issues this thesis investigates into the transactions and data management in NoSQL databases.
It presents a novel approach that implements transactional support for NoSQL databases in order to ensure stronger data consistency and provide appropriate level of performance. The novelty lies in the design of a Multi-Key transaction model that guarantees the standard properties of transactions in order to ensure stronger consistency and integrity of data. The model is implemented in a novel loosely-coupled architecture that separates the implementation of transactional logic from the underlying data thus ensuring transparency and abstraction in cloud and NoSQL databases. The proposed approach is validated through the development of a prototype system using real MongoDB system. An extended version of the standard Yahoo! Cloud Services Benchmark (YCSB) has been used in order to test and evaluate the proposed approach. Various experiments have been conducted and sets of results have been generated. The results show that the proposed approach meets the research objectives. It maintains stronger consistency of cloud data as well as appropriate level of reliability and performance
Recommended from our members
Bringing modular concurrency control to the next level
Database users face a tension between ease-of-programming and high performance: ACID transactions can greatly simplify the programming effort of database applications by providing four useful properties—atomicity, consistency, isolation, and durability, but enforcing these properties can degrade performance.
This dissertation eases this tension by improving the performance of ACID transactions for scenarios where data contention is the bottleneck. The approach that we take is federating concurrency control (CC) mechanisms. It is based on the observation that any single CC mechanism is bound to make trade-offs that cause it to perform well in some cases but poorly in others. A federation opens the opportunity of applying each mechanism only to the set of transactions or workloads where it shines, while maintaining isolation.
In particular, this work builds upon Modular Concurrency Control (MCC), a recent technique that federates CCs by partitioning transactions into groups, and by applying different CC mechanisms in each group.
This dissertation addresses two critical shortcomings in the current embodiment of MCC. First, cross-group data conflicts are handled with a single, unoptimized CC mechanism that can significantly limit performance. Second, configuring MCC is a complex task, which runs counter to MCC’s purpose: to improve performance without sacrificing ease-of-programming.
To address these problems, this dissertation presents Tebaldi, a new transactional database that brings Modular Concurrency Control to the next level, both figuratively and literally. Tebaldi introduces a new, hierarchical model to MCC that partitions transactions recursively to compose CC mechanisms in a multi-level tree. This model increases flexibility in federating CC mechanisms, which is the key to realizing the performance potential of federation. Tebaldi reduces configuration complexity by managing the MCC federation automatically: it can detect performance issues in the current workload in real-time, and automatically adjusts its configuration to improve its performance.Computer Science
Blockchain-based Security Framework for Critical Industry 4.0 Cyber-physical System
There has been an intense concern for security alternatives because of the
recent rise of cyber attacks, mainly targeting critical systems such as
industry, medical, or energy ecosystem. Though the latest industry
infrastructures largely depend on AI-driven maintenance, the prediction based
on corrupted data undoubtedly results in loss of life and capital. Admittedly,
an inadequate data-protection mechanism can readily challenge the security and
reliability of the network. The shortcomings of the conventional cloud or
trusted certificate-driven techniques have motivated us to exhibit a unique
Blockchain-based framework for a secure and efficient industry 4.0 system. The
demonstrated framework obviates the long-established certificate authority
after enhancing the consortium Blockchain that reduces the data processing
delay, and increases cost-effective throughput. Nonetheless, the distributed
industry 4.0 security model entails cooperative trust than depending on a
single party, which in essence indulges the costs and threat of the single
point of failure. Therefore, multi-signature technique of the proposed
framework accomplishes the multi-party authentication, which confirms its
applicability for the real-time and collaborative cyber-physical system.Comment: 07 Pages, 4 Figures, IEEE Communication Magazin
Ordering, timeliness and reliability for publish/subscribe systems over WAN
In the last few years, the increasing use of the Internet and geo-political, sociological and financial changes induced by globalization, are paving the way for a connected world where the information is always available at the right place and the right time. As such, applications previously deployed for ``closed'' environmets, are now federating into geographically distributed systems connected through a Wide Area Network (WAN). By this evolution, in the near future no system will be isolated: every system will be composed by interconnected systems, i.e., it will be a System of Systems (SoS). Example of SoS are the Large-scale Complex Critical Infrastructure (LCCIs), such as power grids, transport infrastructures (airports and seaports), financial infrastructures, next generation intelligence platforms, to cite a few. In these systems, multiple sources of information generate a high volume of events that need to be delivered to all intended destinations by respecting several Quality of Service (QoS) constraints imposed by the critical nature of LCCIs. As such, particular attention is devoted to the middleware solution used to disseminate information in the SoS. Due to its inherently scalability provided by space, time and synchronization decoupling properties, the publish/subscribe paradigm is becoming attractive for the implementation of a middleware service for LCCIs. However, scalability is not the only requirement exhibited by SoS. Several services need to control a broader set of QoS requirements, such as timeliness, ordering and reliability. Unfortunately, current middleware solutions do not address QoS constraints required by SoS. Current publish/subscribe middleware solutions for the WAN environment offer only a best effort event dissemination, with no additional control on QoS. Just a few implementations try to address some isolated QoS policy, making them not suitable for a SoS scenario. The contribution of this thesis is to devise a QoS layer that can be posed on top of a generic publish/subscribe middleware that enriches its service by addressing: (i) ordering, (ii) reliability and (iii) timeliness in event dissemination in SoS over WAN. Specifically, we first analyze several real case studies, by highlighting their QoS requirements in terms of ordering, reliability and timeliness, and compare these requirements with both current research prototypes and commercial systems. Then, we fill the gap by proposing novel algorithms to address those requirements. The proposed protocols can also be combined together in order to provide the QoS level required by the particular application. In this way, QoS issues do not need to be addressed at application level, so as to leave applications to implement just their native functionalities
Ordering, timeliness and reliability for publish/subscribe systems over WAN
In the last few years, the increasing use of the Internet and geo-political, sociological and financial changes induced by globalization, are paving the way for a connected world where the information is always available at the right place and the right time. As such, applications previously deployed for ``closed'' environmets, are now federating into geographically distributed systems connected through a Wide Area Network (WAN). By this evolution, in the near future no system will be isolated: every system will be composed by interconnected systems, i.e., it will be a System of Systems (SoS). Example of SoS are the Large-scale Complex Critical Infrastructure (LCCIs), such as power grids, transport infrastructures (airports and seaports), financial infrastructures, next generation intelligence platforms, to cite a few. In these systems, multiple sources of information generate a high volume of events that need to be delivered to all intended destinations by respecting several Quality of Service (QoS) constraints imposed by the critical nature of LCCIs. As such, particular attention is devoted to the middleware solution used to disseminate information in the SoS. Due to its inherently scalability provided by space, time and synchronization decoupling properties, the publish/subscribe paradigm is becoming attractive for the implementation of a middleware service for LCCIs. However, scalability is not the only requirement exhibited by SoS. Several services need to control a broader set of QoS requirements, such as timeliness, ordering and reliability. Unfortunately, current middleware solutions do not address QoS constraints required by SoS. Current publish/subscribe middleware solutions for the WAN environment offer only a best effort event dissemination, with no additional control on QoS. Just a few implementations try to address some isolated QoS policy, making them not suitable for a SoS scenario. The contribution of this thesis is to devise a QoS layer that can be posed on top of a generic publish/subscribe middleware that enriches its service by addressing: (i) ordering, (ii) reliability and (iii) timeliness in event dissemination in SoS over WAN. Specifically, we first analyze several real case studies, by highlighting their QoS requirements in terms of ordering, reliability and timeliness, and compare these requirements with both current research prototypes and commercial systems. Then, we fill the gap by proposing novel algorithms to address those requirements. The proposed protocols can also be combined together in order to provide the QoS level required by the particular application. In this way, QoS issues do not need to be addressed at application level, so as to leave applications to implement just their native functionalities
The enterprise blockchain design framework and its application to an e-Procurement ecosystem
The research work of this paper has been partially funded by the project VORTAL INTER DATA (n° 038361), co-financed by Vortal and COMPETE Program P2020.
We would also like to thank UNIDEMI, DEMI, and LASI for providing us with the research infrastucture and resources to conduct this research.
Publisher Copyright:
© 2022 Elsevier LtdBlockchain technologies have seen a steady growth in interest from industries as the technology is gaining maturity. It is offering a novel way to establish trust amongst multiple stakeholders without relying or trusting centralised authorities. While its use as a decentralised store of value has been validated through the emergence of cryptocurrencies, its use case in industrial applications with multiple stakeholder ecosystems such as industrial supply chain management, is still at an early stage of design and experimentation where private blockchains are used as opposed to public blockchains. Many enterprise blockchain projects failed to gain traction after initial launches, due to inefficient design, lack of incentives to all stakeholders or simply because the use of blockchain was not really necessary in the first place. There has been a need for a framework that allows blockchain designers and researchers to evaluate scenarios when a blockchain solution is useful and design the key configurations for an enterprise blockchain solution. Literature on blockchain architectures are sparse and only applicable to specific use cases or functionalities. This paper proposes a comprehensive Enterprise Blockchain Design Framework (EBDF), that not only identifies the relevant use cases when a blockchain must be utilised, but also details all the characteristics and configurations for designing an enterprise blockchain ecosystem, applicable to multiple industries. To validate the EBDF, we apply the same to the Vortal e-Procurement ecosystem allowing for multiple platforms to interoperate with greater transparency and accountability over the proposed blockchain framework. In this use case, many vendors bid for procurement procedures, often for publicly managed funds where it is extremely vital that full transparency and accountability is ensured in the entire process. Ensuring that certain digital certification functions, such as timestamps are independent from e-Procurement platform owners has been a challenge. Blockchain technology has emerged as a promising solution for not only ensuring transparency and immutability of records, but also providing for interoperability across different platforms by acting as a trusted third-party. The applied framework is used to design a Hyperledger based blockchain solution with some of the key architectural elements that could fulfil these needs while presenting the advantages of such a solution.publishersversionpublishe
Enhancing Data Processing on Clouds with Hadoop/HBase
In the current information age, large amounts of data are being generated and accumulated rapidly in various industrial and scientific domains. This imposes important demands on data processing capabilities that can extract sensible and valuable information from the large amount of data in a timely manner. Hadoop, the open source implementation of Google's data processing framework (MapReduce, Google File System and BigTable), is becoming increasingly popular and being used to solve data processing problems in various application scenarios. However, being originally designed for handling very large data sets that can be divided easily in parts to be processed independently with limited inter-task communication, Hadoop lacks applicability to a wider usage case. As a result, many projects are under way to enhance Hadoop for different application needs, such as data warehouse applications, machine learning and data mining applications, etc. This thesis is one such research effort in this direction. The goal of the thesis research is to design novel tools and techniques to extend and enhance the large-scale data processing capability of Hadoop/HBase on clouds, and to evaluate their effectiveness in performance tests on prototype implementations. Two main research contributions are described. The first contribution is a light-weight computational workflow system called "CloudWF" for Hadoop. The second contribution is a client library called "HBaseSI" supporting transactional snapshot isolation (SI) in HBase, Hadoop's database component.
CloudWF addresses the problem of automating the execution of scientific workflows composed of both MapReduce and legacy applications on clouds with Hadoop/HBase. CloudWF is the first computational workflow system built directly using Hadoop/HBase. It uses novel methods in handling workflow directed acyclic graph decomposition, storing and querying dependencies in HBase sparse tables, transparent file staging, and decentralized workflow execution management relying on the MapReduce framework for task scheduling and fault tolerance.
HBaseSI addresses the problem of maintaining strong transactional data consistency in HBase tables. This is the first SI mechanism developed for HBase. HBaseSI uses novel methods in handling distributed transactional management autonomously by individual clients. These methods greatly simplify the design of HBaseSI and can be generalized to other column-oriented stores with similar architecture as HBase. As a result of the simplicity in design, HBaseSI adds low overhead to HBase performance and directly inherits many desirable properties of HBase. HBaseSI is non-intrusive to existing HBase installations and user data, and is designed to work with a large cloud in terms of data size and the number of nodes in the cloud
Management of concurrency in a reliable object-oriented computing system
PhD ThesisModern computing systems support concurrency as a means of increasing
the performance of the system. However, the potential for increased performance
is not without its problems. For example, lost updates and inconsistent retrieval
are but two of the possible consequences of unconstrained concurrency. Many
concurrency control techniques have been designed to combat these problems;
this thesis considers the applicability of some of these techniques in the context of
a reliable object-oriented system supporting atomic actions.
The object-oriented programming paradigm is one approach to handling the
inherent complexity of modern computer programs. By modeling entities from
the real world as objects which have well-defined interfaces, the interactions in
the system can be carefully controlled. By structuring sequences of such
interactions as atomic actions, then the consistency of the system is assured.
Objects are encapsulated entities such that their internal representation is not
externally visible. This thesis postulates that this encapsulation should also
include the capability for an object to be responsible for its own concurrency
control.
Given this latter assumption, this thesis explores the means by which the
property of type-inheritance possessed by object-oriented languages can be
exploited to allow programmers to explicitly control the level of concurrency an
object supports. In particular, a object-oriented concurrency controller based
upon the technique of two-phase locking is described and implemented using
type-inheritance. The thesis also shows how this inheritance-based approach is
highly flexible such that the basic concurrency control capabilities can be adopted
unchanged or overridden with more type-specific concurrency control if requiredUK Science and Engineering Research Council,
Serc/Alve
Dwarna : a blockchain solution for dynamic consent in biobanking
Dynamic consent aims to empower research partners and facilitate active participation in the research process. Used within
the context of biobanking, it gives individuals access to information and control to determine how and where their
biospecimens and data should be used. We present Dwarna—a web portal for ‘dynamic consent’ that acts as a hub
connecting the different stakeholders of the Malta Biobank: biobank managers, researchers, research partners, and the
general public. The portal stores research partners’ consent in a blockchain to create an immutable audit trail of research
partners’ consent changes. Dwarna’s structure also presents a solution to the European Union’s General Data Protection
Regulation’s right to erasure—a right that is seemingly incompatible with the blockchain model. Dwarna’s transparent
structure increases trustworthiness in the biobanking process by giving research partners more control over which research
studies they participate in, by facilitating the withdrawal of consent and by making it possible to request that the biospecimen
and associated data are destroyed.peer-reviewe
- …