1,376 research outputs found
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Relational into Non-Relational Database Migration with Multiple-Nested Schema Methods on Academic Data
The rapid development of internet technology has increased the need of data storage and processing technology application. One application is to manage academic data records at educational institutions. Along with massive growth of information, decrement in the traditional database performance is inevitable. Hence, there are many companies choose to migrate to NoSQL, a technology that is able to overcome the traditional database shortcomings. However, the existing SQL to NoSQL migration tools have not been able to represent SQL data relations in NoSQL without limiting query performance. In this paper, a relational database transformation system transforming MySQL into non-relational database MongoDB was developed, using the Multiple Nested Schema method for academic databases. The development began with a transformation scheme design. The transformation scheme was then implemented in the migration process, using PDI/Kettle. The testing was carried out on three aspects, namely query response time, data integrity, and storage requirements. The test results showed that the developed system successfully represented the relationship of SQL data in NoSQL, provided complex query performance 13.32 times faster in the migration database, basic query performance involving SQL transaction tables 28.6 times faster on migration results, and basic performance Queries without involving SQL transaction tables were 3.91 times faster in the migration source. This shows that the theory of the Multiple Nested Schema method, aiming to overcome the poor performance of queries involving many JOIN operations, is proved. In addition, the system is also proven to be able to maintain data integrity in all tested queries. The space performance test results indicated that the migrated database transformed using the Multiple Nested Schema method showed a storage requirement of 10.53 times larger than the migration source database. This is due to the large amount of data redundancy resulting from the transformation process. However, at present, storage performance is not a top priority in data processing technology, so large storage requirements are a consequence of obtaining efficient query performance, which is still considered as the first priority in data processing technology
Implementation and test of transactional primitives over Cassandra
Dissertação de mestrado em Engenharia InformáticaNoSQL databases opt not to offer important abstractions traditionally
found in relational databases in order to achieve high levels of scalability and
availability: transactional guarantees and strong data consistency. These
limitations bring considerable complexity to the development of client applications
and are therefore an obstacle to the broader adoption of the technology.
In this work we propose a middleware layer over NoSQL databases that
offers transactional guarantees with Snapshot Isolation. The proposed solution
is achieved in a non-intrusive manner, providing to the clients the same
interface as a NoSQL database, simply adding the transactional context. The
transactional context is the focus of our contribution and is modularly based
on a Non Persistent Version Store that holds several versions of elements
and interacts with an external transaction certifier.
In this work, we present an implementation of our system over Apache
Cassandra and by using two representative benchmarks, YCSB and TPC-C,
we measure the cost of adding transactional support with ACID guarantees.As bases de dados NoSQL optam por não oferecer importantes abstrações
tradicionalmente encontradas nas bases de dados relacionais, de modo a
atingir elevada escalabilidade e disponibilidade: garantias transacionais e
critérios de coerência de dados fortes. Estas limitações resultam em maior
complexidade no desenvolvimento de aplicações e são por isso um obstáculo
à ampla adoção do paradigma.
Neste trabalho, propomos uma camada de middleware sobre bases de
dados NoSQL que oferece garantias transacionais com Snapshot Isolation.
A abordagem proposta e não-intrusiva, apresentando aos clientes a mesma
interface NoSQL, acrescendo o contexto transacional. Este contexto transacional
e o cerne da nossa contribuição e assenta modularmente num repositório
de versões não-persistente e num certificador externo de transações concorrentes.
Neste trabalho, apresentamos uma implementação do nosso sistema sobre
Apache Cassandra e, recorrendo a dois benchmarks representativos, YCBS e
TPC-C, medimos o custo do suporte do paradigma transacional com garantias
transacionais ACID.Fundação para a Ciência e a Tecnologia (FCT) - Project Stratus/FCOMP-01-0124-FEDER-015020; within project Pest/
FCOMP-01-0124-FEDER-022701.ERDF - European Regional Development
Fund through the COMPETE Programme (operational programme for competitiveness).European Union Seventh Framework
Programme (FP7) under grant agreement no 257993 (CumuloNimbo)
Transactions and data management in NoSQL cloud databases
NoSQL databases have become the preferred option for storing and processing data in cloud computing as they are capable of providing high data availability, scalability and efficiency. But in order to achieve these attributes, NoSQL databases make certain trade-offs. First, NoSQL databases cannot guarantee strong consistency of data. They only guarantee a weaker consistency which is based on eventual consistency model. Second, NoSQL databases adopt a simple data model which makes it easy for data to be scaled across multiple nodes. Third, NoSQL databases do not support table joins and referential integrity which by implication, means they cannot implement complex queries. The combination of these factors implies that NoSQL databases cannot support transactions. Motivated by these crucial issues this thesis investigates into the transactions and data management in NoSQL databases.
It presents a novel approach that implements transactional support for NoSQL databases in order to ensure stronger data consistency and provide appropriate level of performance. The novelty lies in the design of a Multi-Key transaction model that guarantees the standard properties of transactions in order to ensure stronger consistency and integrity of data. The model is implemented in a novel loosely-coupled architecture that separates the implementation of transactional logic from the underlying data thus ensuring transparency and abstraction in cloud and NoSQL databases. The proposed approach is validated through the development of a prototype system using real MongoDB system. An extended version of the standard Yahoo! Cloud Services Benchmark (YCSB) has been used in order to test and evaluate the proposed approach. Various experiments have been conducted and sets of results have been generated. The results show that the proposed approach meets the research objectives. It maintains stronger consistency of cloud data as well as appropriate level of reliability and performance
- …