Search CORE

1,376 research outputs found

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

Author: Alam Mansaf
Ali Syed Arshad
Khan Samiya
Liu Xiufeng
Publication venue
Publication date: 01/01/2019
Field of study

Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

arXiv.org e-Print Archive

Online Research Database In Technology

Umzi: Unified Multi-Zone Indexing for Large-Scale HTAP

Author: Barber Ronald
Luo Chen
Raman Vijayshankar
Sidle Richard
Tian Yuanyuan
Tözün Pinar
Publication venue
Publication date: 01/01/2019
Field of study

The IT University of Copenhagen's Repository

Relational into Non-Relational Database Migration with Multiple-Nested Schema Methods on Academic Data

Author: Adji Teguh Bharata
Sari Dwi Retno Puspita
Setiawan Noor Akhmad
Publication venue: 'Universitas Gadjah Mada'
Publication date: 13/09/2019
Field of study

The rapid development of internet technology has increased the need of data storage and processing technology application. One application is to manage academic data records at educational institutions. Along with massive growth of information, decrement in the traditional database performance is inevitable. Hence, there are many companies choose to migrate to NoSQL, a technology that is able to overcome the traditional database shortcomings. However, the existing SQL to NoSQL migration tools have not been able to represent SQL data relations in NoSQL without limiting query performance. In this paper, a relational database transformation system transforming MySQL into non-relational database MongoDB was developed, using the Multiple Nested Schema method for academic databases. The development began with a transformation scheme design. The transformation scheme was then implemented in the migration process, using PDI/Kettle. The testing was carried out on three aspects, namely query response time, data integrity, and storage requirements. The test results showed that the developed system successfully represented the relationship of SQL data in NoSQL, provided complex query performance 13.32 times faster in the migration database, basic query performance involving SQL transaction tables 28.6 times faster on migration results, and basic performance Queries without involving SQL transaction tables were 3.91 times faster in the migration source. This shows that the theory of the Multiple Nested Schema method, aiming to overcome the poor performance of queries involving many JOIN operations, is proved. In addition, the system is also proven to be able to maintain data integrity in all tested queries. The space performance test results indicated that the migrated database transformed using the Multiple Nested Schema method showed a storage requirement of 10.53 times larger than the migration source database. This is due to the large amount of data redundancy resulting from the transformation process. However, at present, storage performance is not a top priority in data processing technology, so large storage requirements are a consequence of obtaining efficient query performance, which is still considered as the first priority in data processing technology

IJITEE (International Journal of Information Technology and Electrical Engineering)

Implementation and test of transactional primitives over Cassandra

Author: Coelho Fábio André Castanheira Luís
Publication venue
Publication date: 01/01/2013
Field of study

Dissertação de mestrado em Engenharia InformáticaNoSQL databases opt not to offer important abstractions traditionally found in relational databases in order to achieve high levels of scalability and availability: transactional guarantees and strong data consistency. These limitations bring considerable complexity to the development of client applications and are therefore an obstacle to the broader adoption of the technology. In this work we propose a middleware layer over NoSQL databases that offers transactional guarantees with Snapshot Isolation. The proposed solution is achieved in a non-intrusive manner, providing to the clients the same interface as a NoSQL database, simply adding the transactional context. The transactional context is the focus of our contribution and is modularly based on a Non Persistent Version Store that holds several versions of elements and interacts with an external transaction certifier. In this work, we present an implementation of our system over Apache Cassandra and by using two representative benchmarks, YCSB and TPC-C, we measure the cost of adding transactional support with ACID guarantees.As bases de dados NoSQL optam por não oferecer importantes abstrações tradicionalmente encontradas nas bases de dados relacionais, de modo a atingir elevada escalabilidade e disponibilidade: garantias transacionais e critérios de coerência de dados fortes. Estas limitações resultam em maior complexidade no desenvolvimento de aplicações e são por isso um obstáculo à ampla adoção do paradigma. Neste trabalho, propomos uma camada de middleware sobre bases de dados NoSQL que oferece garantias transacionais com Snapshot Isolation. A abordagem proposta e não-intrusiva, apresentando aos clientes a mesma interface NoSQL, acrescendo o contexto transacional. Este contexto transacional e o cerne da nossa contribuição e assenta modularmente num repositório de versões não-persistente e num certificador externo de transações concorrentes. Neste trabalho, apresentamos uma implementação do nosso sistema sobre Apache Cassandra e, recorrendo a dois benchmarks representativos, YCBS e TPC-C, medimos o custo do suporte do paradigma transacional com garantias transacionais ACID.Fundação para a Ciência e a Tecnologia (FCT) - Project Stratus/FCOMP-01-0124-FEDER-015020; within project Pest/ FCOMP-01-0124-FEDER-022701.ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness).European Union Seventh Framework Programme (FP7) under grant agreement no 257993 (CumuloNimbo)

Universidade do Minho: RepositoriUM

Transactions and data management in NoSQL cloud databases

Author: Ogunyadeka Adewole C.
Publication venue: Oxford Brookes University
Publication date: 01/01/2016
Field of study

NoSQL databases have become the preferred option for storing and processing data in cloud computing as they are capable of providing high data availability, scalability and efficiency. But in order to achieve these attributes, NoSQL databases make certain trade-offs. First, NoSQL databases cannot guarantee strong consistency of data. They only guarantee a weaker consistency which is based on eventual consistency model. Second, NoSQL databases adopt a simple data model which makes it easy for data to be scaled across multiple nodes. Third, NoSQL databases do not support table joins and referential integrity which by implication, means they cannot implement complex queries. The combination of these factors implies that NoSQL databases cannot support transactions. Motivated by these crucial issues this thesis investigates into the transactions and data management in NoSQL databases. It presents a novel approach that implements transactional support for NoSQL databases in order to ensure stronger data consistency and provide appropriate level of performance. The novelty lies in the design of a Multi-Key transaction model that guarantees the standard properties of transactions in order to ensure stronger consistency and integrity of data. The model is implemented in a novel loosely-coupled architecture that separates the implementation of transactional logic from the underlying data thus ensuring transparency and abstraction in cloud and NoSQL databases. The proposed approach is validated through the development of a prototype system using real MongoDB system. An extended version of the standard Yahoo! Cloud Services Benchmark (YCSB) has been used in order to test and evaluate the proposed approach. Various experiments have been conducted and sets of results have been generated. The results show that the proposed approach meets the research objectives. It maintains stronger consistency of cloud data as well as appropriate level of reliability and performance

Oxford Brookes University: RADAR