1,376 research outputs found

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    Relational into Non-Relational Database Migration with Multiple-Nested Schema Methods on Academic Data

    Get PDF
    The rapid development of internet technology has increased the need of data storage and processing technology application. One application is to manage academic data records at educational institutions. Along with massive growth of information, decrement in the traditional database performance is inevitable. Hence, there are many companies choose to migrate to NoSQL, a technology that is able to overcome the traditional database shortcomings. However, the existing SQL to NoSQL migration tools have not been able to represent SQL data relations in NoSQL without limiting query performance. In this paper, a relational database transformation system transforming MySQL into non-relational database MongoDB was developed, using the Multiple Nested Schema method for academic databases. The development began with a transformation scheme design. The transformation scheme was then implemented in the migration process, using PDI/Kettle. The testing was carried out on three aspects, namely query response time, data integrity, and storage requirements. The test results showed that the developed system successfully represented the relationship of SQL data in NoSQL, provided complex query performance 13.32 times faster in the migration database, basic query performance involving SQL transaction tables 28.6 times faster on migration results, and basic performance Queries without involving SQL transaction tables were 3.91 times faster in the migration source. This shows that the theory of the Multiple Nested Schema method, aiming to overcome the poor performance of queries involving many JOIN operations, is proved. In addition, the system is also proven to be able to maintain data integrity in all tested queries. The space performance test results indicated that the migrated database transformed using the Multiple Nested Schema method showed a storage requirement of 10.53 times larger than the migration source database. This is due to the large amount of data redundancy resulting from the transformation process. However, at present, storage performance is not a top priority in data processing technology, so large storage requirements are a consequence of obtaining efficient query performance, which is still considered as the first priority in data processing technology

    Implementation and test of transactional primitives over Cassandra

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaNoSQL databases opt not to offer important abstractions traditionally found in relational databases in order to achieve high levels of scalability and availability: transactional guarantees and strong data consistency. These limitations bring considerable complexity to the development of client applications and are therefore an obstacle to the broader adoption of the technology. In this work we propose a middleware layer over NoSQL databases that offers transactional guarantees with Snapshot Isolation. The proposed solution is achieved in a non-intrusive manner, providing to the clients the same interface as a NoSQL database, simply adding the transactional context. The transactional context is the focus of our contribution and is modularly based on a Non Persistent Version Store that holds several versions of elements and interacts with an external transaction certifier. In this work, we present an implementation of our system over Apache Cassandra and by using two representative benchmarks, YCSB and TPC-C, we measure the cost of adding transactional support with ACID guarantees.As bases de dados NoSQL optam por não oferecer importantes abstrações tradicionalmente encontradas nas bases de dados relacionais, de modo a atingir elevada escalabilidade e disponibilidade: garantias transacionais e critérios de coerência de dados fortes. Estas limitações resultam em maior complexidade no desenvolvimento de aplicações e são por isso um obstáculo à ampla adoção do paradigma. Neste trabalho, propomos uma camada de middleware sobre bases de dados NoSQL que oferece garantias transacionais com Snapshot Isolation. A abordagem proposta e não-intrusiva, apresentando aos clientes a mesma interface NoSQL, acrescendo o contexto transacional. Este contexto transacional e o cerne da nossa contribuição e assenta modularmente num repositório de versões não-persistente e num certificador externo de transações concorrentes. Neste trabalho, apresentamos uma implementação do nosso sistema sobre Apache Cassandra e, recorrendo a dois benchmarks representativos, YCBS e TPC-C, medimos o custo do suporte do paradigma transacional com garantias transacionais ACID.Fundação para a Ciência e a Tecnologia (FCT) - Project Stratus/FCOMP-01-0124-FEDER-015020; within project Pest/ FCOMP-01-0124-FEDER-022701.ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness).European Union Seventh Framework Programme (FP7) under grant agreement no 257993 (CumuloNimbo)

    Transactions and data management in NoSQL cloud databases

    Get PDF
    NoSQL databases have become the preferred option for storing and processing data in cloud computing as they are capable of providing high data availability, scalability and efficiency. But in order to achieve these attributes, NoSQL databases make certain trade-offs. First, NoSQL databases cannot guarantee strong consistency of data. They only guarantee a weaker consistency which is based on eventual consistency model. Second, NoSQL databases adopt a simple data model which makes it easy for data to be scaled across multiple nodes. Third, NoSQL databases do not support table joins and referential integrity which by implication, means they cannot implement complex queries. The combination of these factors implies that NoSQL databases cannot support transactions. Motivated by these crucial issues this thesis investigates into the transactions and data management in NoSQL databases. It presents a novel approach that implements transactional support for NoSQL databases in order to ensure stronger data consistency and provide appropriate level of performance. The novelty lies in the design of a Multi-Key transaction model that guarantees the standard properties of transactions in order to ensure stronger consistency and integrity of data. The model is implemented in a novel loosely-coupled architecture that separates the implementation of transactional logic from the underlying data thus ensuring transparency and abstraction in cloud and NoSQL databases. The proposed approach is validated through the development of a prototype system using real MongoDB system. An extended version of the standard Yahoo! Cloud Services Benchmark (YCSB) has been used in order to test and evaluate the proposed approach. Various experiments have been conducted and sets of results have been generated. The results show that the proposed approach meets the research objectives. It maintains stronger consistency of cloud data as well as appropriate level of reliability and performance
    corecore