39 research outputs found

    Data management in cloud environments: NoSQL and NewSQL data stores

    Get PDF
    : Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the large number and diversity of existing NoSQL and NewSQL solutions, it is difficult to comprehend the domain and even more challenging to choose an appropriate solution for a specific task. Therefore, this paper reviews NoSQL and NewSQL solutions with the objective of: (1) providing a perspective in the field, (2) providing guidance to practitioners and researchers to choose the appropriate data store, and (3) identifying challenges and opportunities in the field. Specifically, the most prominent solutions are compared focusing on data models, querying, scaling, and security related capabilities. Features driving the ability to scale read requests and write requests, or scaling data storage are investigated, in particular partitioning, replication, consistency, and concurrency control. Furthermore, use cases and scenarios in which NoSQL and NewSQL data stores have been used are discussed and the suitability of various solutions for different sets of applications is examined. Consequently, this study has identified challenges in the field, including the immense diversity and inconsistency of terminologies, limited documentation, sparse comparison and benchmarking criteria, and nonexistence of standardized query languages

    Study on performance modeling and assurance of cross/permissionless/permissioned chains

    Get PDF
    This research addresses and resolves the performance modeling and assurance issues across the full spectrum of blockchain protocols, from permissionless (Chapter II) and permissioned (Chapter III) to cross-chain (Chapter IV). In Chapter II, a queueing model for permissionless blockchains and validations is proposed with respect to specific yet practical characteristics of the blockchains such as Bitcoin and Ethereum, primarily in terms of the block size and its waiting time. A set of variables considered in this model lists the network traffic intensity, the maximum number of transactions in a block, the block time, and the transaction arrival rate, to mention a few. Numerical simulations are conducted, and the efficacy of the proposed model is validated in a quantitative yet practical manner versus Bitcoin and Ethereum. In Chapter III, a set of queueing models for permissioned blockchain, which is considered an emerging technology for a trustworthy decentralized network, is proposed. Hyperledger Fabric is a well-defined permissioned blockchain. It is constructed by various types of nodes, such as the nodes for endorsement, ordering, and commitment, to realize the decentralized nature of trustworthy network operations. Each type of node is characterized in terms of transaction/block queue size and waiting time, and the transaction/block arrival rates and the transaction/block service rates are considered for simulation purposes. It is taken into account how the arrival rates and the service rates co-influence the performance and how the number of channels impact the performance in order to ultimately facilitate a more dynamic way of optimization. The efficacy of the proposed models is demonstrated by the extensive numerical simulations and analyses. In Chapter IV, a cross-chain communication protocol and a m/Cox/1 queueing model-based performance model are proposed. Cross-chain communication considers two distinct types of transactions, such as an atomic swap and an inter-ledger asset transfer. They are controlled by different types of communication mechanisms, namely, Hashed Time Lock Contract (HTLC) based on a pre-image-based technique, and inter-ledger asset transfer, based on an asynchronous verification technique. In the performance model, a Poisson arrival process is assumed, and the two services for pre-commit, verify and commit are assumed to be exponential distributions. Lastly, the selection ratio of a communication protocol between HTLC and the inter-ledger asset transfer is assumed. Extensive numerical simulations are conducted to study the performance impact of changing the parameters, such as arrival rate, service rate, and the ratio of communication protocol. In this research, the proposed models provide a comprehensive yet fundamental basis to assure and ultimately optimize the design of blockchain technology-based applications in specific terms of performance

    Modeling a Consortium-based Distributed Ledger Network with Applications for Intelligent Transportation Infrastructure

    Get PDF
    Emerging distributed-ledger networks are changing the landscape for environments of low trust among participating entities. Implementing such technologies in transportation infrastructure communications and operations would enable, in a secure fashion, decentralized collaboration among entities who do not fully trust each other. This work models a transportation records and events data collection system enabled by a Hyperledger Fabric blockchain network and simulated using a transportation environment modeling tool. A distributed vehicle records management use case is shown with the capability to detect and prevent unauthorized vehicle odometer tampering. Another use case studied is that of vehicular data collected during the event of an accident. It relies on broadcast data collected from the Vehicle Ad-hoc Network (VANET) and submitted as witness reports from nearby vehicles or road-side units who observed the event taking place or detected misbehaving activity by vehicles involved in the accident. Mechanisms for the collection, validation, and corroboration of the reported data which may prove crucial for vehicle accident forensics are described and their implementation is discussed. A performance analysis of the network under various loads is conducted with results suggesting that tailored endorsement policies are an effective mechanism to improve overall network throughput for a given channel. The experimental testbed shows that Hyperledger Fabric and other distributed ledger technologies hold promise for the collection of transportation data and the collaboration of applications and services that consume it

    DATA MIGRATION FROM STANDARD SQL TO NoSQL

    Get PDF
    Currently two major database management systems are in use for dealing with data, the Relational Database Management System (RDBMS) also knows as standard SQL databases and the NoSQL databases. The RDBMS databases deal with structured data and the NoSQL databases with unstructured or semi-structured data. The RDBMS databases have been popular for many years but the NoSQL type is gaining popularity with the introduction of the internet and social media. Data flow from SQL to NoSQL or vice versa is very much possible in the near future due to the growing popularity of the NoSQL databases. The goal of this thesis is to analyze the data structures of the RDBMS and the NoSQL databases and to suggest a Graphical User Interface (GUI) tool that migrates the data from SQL to NoSQL databases. The relational databases have been in use and have dominated the industry for many years. In contrast, the NoSQL databases were introduced with the increased usage of the internet, social media, and cloud computing. The traditional relational databases guarantee data integrity whereas high availability and scalability are the main advantages of the NoSQL databases. This thesis presents a comparison of these two technologies. It compares the data structure and data storing techniques of the two technologies. The SQL databases store data differently as compared to the NoSQL databases due to their specific demands. The data stored in the relational databases is highly structured and normalized in most environments whereas the data in the NoSQL databases are mostly unstructured. This difference of the data structure helps in meeting the specific demands of these two systems. The NoSQL DBs are scalable with high availability due to the simpler data model but does not guarantee data consistency at all times. On the other hand the RDBMS systems are not easily scalable and available at the same time due to the complex data model but guarantees data consistency. This thesis uses CouchDB and MySQL to represent the NoSQL and standard SQL databases respectively. The aim of the iii research in this document is to suggest a methodology for data migration from the RDBMS databases to the document-based NoSQL databases. Data migration between the RDBMS and the NoSQL systems is anticipated because both systems are currently in use by many industry leaders. This thesis presents a Graphical User Interface as a starting point that enables the data migration from the RDBMS to the NoSQL databases. MySQL and CouchDB are used as the test databases for the relational and NoSQL systems respectively. This thesis presents an architecture and methodology to achieve this objective

    Implementing Azure Active Directory Integration with an Existing Cloud Service

    Get PDF
    Training Simulator (TraSim) is an online, web-based platform for holding crisis management exercises. It simulates epidemics and other exceptional situations to test the functionality of an organization’s operating instructions in the hour of need. The main objective of this thesis is to further develop the service by delegating its existing authentication and user provisioning mechanisms to a centralized, cloud-based Identity and Access Management (IAM) service. Making use of a centralized access control service is widely known as a Single Sign-On (SSO) implementation which comes with multiple benefits such as increased security, reduced administrative overhead and improved user experience. The objective originates from a customer organization’s request to enable SSO for TraSim. The research mainly focuses on implementing SSO by integrating TraSim with Azure Active Directory (AD) from a wide range of IAM services since it is considered as an industry standard and already utilized by the customer. Anyhow, the complexity of the integration is kept as reduced as possible to retain compatibility with other services besides Azure AD. While the integration is a unique operation with an endless amount of software stacks that a service can build on and multiple IAM services to choose from, this thesis aims to provide a general guideline of how to approach a resembling assignment. Conducting the study required extensive search and evaluation of the available literature about terms such as IAM, client-server communication, SSO, cloud services and AD. The literature review is combined with an introduction to the basic technologies that TraSim is built with to justify the choice of OpenID Connect as the authentication protocol and why it was implemented using the mozilla-django-oidc library. The literature consists of multiple online articles, publications and the official documentation of the utilized technologies. The research uses a constructive approach as it focuses into developing and testing a new feature that is merged into the source code of an already existing piece of software

    Unified System on Chip RESTAPI Service (USOCRS)

    Get PDF
    Abstract. This thesis investigates the development of a Unified System on Chip RESTAPI Service (USOCRS) to enhance the efficiency and effectiveness of SOC verification reporting. The research aims to overcome the challenges associated with the transfer, utilization, and interpretation of SoC verification reports by creating a unified platform that integrates various tools and technologies. The research methodology used in this study follows a design science approach. A thorough literature review was conducted to explore existing approaches and technologies related to SOC verification reporting, automation, data visualization, and API development. The review revealed gaps in the current state of the field, providing a basis for further investigation. Using the insights gained from the literature review, a system design and implementation plan were developed. This plan makes use of cutting-edge technologies such as FASTAPI, SQL and NoSQL databases, Azure Active Directory for authentication, and Cloud services. The Verification Toolbox was employed to validate SoC reports based on the organization’s standards. The system went through manual testing, and user satisfaction was evaluated to ensure its functionality and usability. The results of this study demonstrate the successful design and implementation of the USOCRS, offering SOC engineers a unified and secure platform for uploading, validating, storing, and retrieving verification reports. The USOCRS facilitates seamless communication between users and the API, granting easy access to vital information including successes, failures, and test coverage derived from submitted SoC verification reports. By automating and standardizing the SOC verification reporting process, the USOCRS eliminates manual and repetitive tasks usually done by developers, thereby enhancing productivity, and establishing a robust and reliable framework for report storage and retrieval. Through the integration of diverse tools and technologies, the USOCRS presents a comprehensive solution that adheres to the required specifications of the SOC schema used within the organization. Furthermore, the USOCRS significantly improves the efficiency and effectiveness of SOC verification reporting. It facilitates the submission process, reduces latency through optimized data storage, and enables meaningful extraction and analysis of report data

    Architectural Principles for Database Systems on Storage-Class Memory

    Get PDF
    Database systems have long been optimized to hide the higher latency of storage media, yielding complex persistence mechanisms. With the advent of large DRAM capacities, it became possible to keep a full copy of the data in DRAM. Systems that leverage this possibility, such as main-memory databases, keep two copies of the data in two different formats: one in main memory and the other one in storage. The two copies are kept synchronized using snapshotting and logging. This main-memory-centric architecture yields nearly two orders of magnitude faster analytical processing than traditional, disk-centric ones. The rise of Big Data emphasized the importance of such systems with an ever-increasing need for more main memory. However, DRAM is hitting its scalability limits: It is intrinsically hard to further increase its density. Storage-Class Memory (SCM) is a group of novel memory technologies that promise to alleviate DRAM’s scalability limits. They combine the non-volatility, density, and economic characteristics of storage media with the byte-addressability and a latency close to that of DRAM. Therefore, SCM can serve as persistent main memory, thereby bridging the gap between main memory and storage. In this dissertation, we explore the impact of SCM as persistent main memory on database systems. Assuming a hybrid SCM-DRAM hardware architecture, we propose a novel software architecture for database systems that places primary data in SCM and directly operates on it, eliminating the need for explicit IO. This architecture yields many benefits: First, it obviates the need to reload data from storage to main memory during recovery, as data is discovered and accessed directly in SCM. Second, it allows replacing the traditional logging infrastructure by fine-grained, cheap micro-logging at data-structure level. Third, secondary data can be stored in DRAM and reconstructed during recovery. Fourth, system runtime information can be stored in SCM to improve recovery time. Finally, the system may retain and continue in-flight transactions in case of system failures. However, SCM is no panacea as it raises unprecedented programming challenges. Given its byte-addressability and low latency, processors can access, read, modify, and persist data in SCM using load/store instructions at a CPU cache line granularity. The path from CPU registers to SCM is long and mostly volatile, including store buffers and CPU caches, leaving the programmer with little control over when data is persisted. Therefore, there is a need to enforce the order and durability of SCM writes using persistence primitives, such as cache line flushing instructions. This in turn creates new failure scenarios, such as missing or misplaced persistence primitives. We devise several building blocks to overcome these challenges. First, we identify the programming challenges of SCM and present a sound programming model that solves them. Then, we tackle memory management, as the first required building block to build a database system, by designing a highly scalable SCM allocator, named PAllocator, that fulfills the versatile needs of database systems. Thereafter, we propose the FPTree, a highly scalable hybrid SCM-DRAM persistent B+-Tree that bridges the gap between the performance of transient and persistent B+-Trees. Using these building blocks, we realize our envisioned database architecture in SOFORT, a hybrid SCM-DRAM columnar transactional engine. We propose an SCM-optimized MVCC scheme that eliminates write-ahead logging from the critical path of transactions. Since SCM -resident data is near-instantly available upon recovery, the new recovery bottleneck is rebuilding DRAM-based data. To alleviate this bottleneck, we propose a novel recovery technique that achieves nearly instant responsiveness of the database by accepting queries right after recovering SCM -based data, while rebuilding DRAM -based data in the background. Additionally, SCM brings new failure scenarios that existing testing tools cannot detect. Hence, we propose an online testing framework that is able to automatically simulate power failures and detect missing or misplaced persistence primitives. Finally, our proposed building blocks can serve to build more complex systems, paving the way for future database systems on SCM
    corecore