Search CORE

51 research outputs found

DATA MIGRATION FROM STANDARD SQL TO NoSQL

Author: Mughees Muhammad
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

Currently two major database management systems are in use for dealing with data, the Relational Database Management System (RDBMS) also knows as standard SQL databases and the NoSQL databases. The RDBMS databases deal with structured data and the NoSQL databases with unstructured or semi-structured data. The RDBMS databases have been popular for many years but the NoSQL type is gaining popularity with the introduction of the internet and social media. Data flow from SQL to NoSQL or vice versa is very much possible in the near future due to the growing popularity of the NoSQL databases. The goal of this thesis is to analyze the data structures of the RDBMS and the NoSQL databases and to suggest a Graphical User Interface (GUI) tool that migrates the data from SQL to NoSQL databases. The relational databases have been in use and have dominated the industry for many years. In contrast, the NoSQL databases were introduced with the increased usage of the internet, social media, and cloud computing. The traditional relational databases guarantee data integrity whereas high availability and scalability are the main advantages of the NoSQL databases. This thesis presents a comparison of these two technologies. It compares the data structure and data storing techniques of the two technologies. The SQL databases store data differently as compared to the NoSQL databases due to their specific demands. The data stored in the relational databases is highly structured and normalized in most environments whereas the data in the NoSQL databases are mostly unstructured. This difference of the data structure helps in meeting the specific demands of these two systems. The NoSQL DBs are scalable with high availability due to the simpler data model but does not guarantee data consistency at all times. On the other hand the RDBMS systems are not easily scalable and available at the same time due to the complex data model but guarantees data consistency. This thesis uses CouchDB and MySQL to represent the NoSQL and standard SQL databases respectively. The aim of the iii research in this document is to suggest a methodology for data migration from the RDBMS databases to the document-based NoSQL databases. Data migration between the RDBMS and the NoSQL systems is anticipated because both systems are currently in use by many industry leaders. This thesis presents a Graphical User Interface as a starting point that enables the data migration from the RDBMS to the NoSQL databases. MySQL and CouchDB are used as the test databases for the relational and NoSQL systems respectively. This thesis presents an architecture and methodology to achieve this objective

eCommons@USASK

University of Saskatchewan Research Archive

Review of performance of various Big Databases

Author: Mallika Wadhwa, Er. Amrit Kaur
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2017
Field of study

Relational databases have been the main model for information data storage, retrieval and administration.A relational database is a table-based data system where there is no scalability, insignificant information duplication, computationally costly table joins and trouble in managing complex information. The greatest inspiration of NoSQL is adaptability. NoSQL information stores are broadly used to store and recover potentially a lot of information.In this paper, we assess four most famous NoSQL databases: Cassandra, MongoDB, and CouchDB

International Journal on Recent and Innovation Trends in Computing and Communication

Migrating From SQL to NoSQL Database: Practices and Analysis

Author: Al Shekh Yassin Fatima Jamal
Publication venue: Scholarworks@UAEU
Publication date: 01/11/2017
Field of study

Most of the enterprises that are dealing with big data are moving towards using NoSQL data structures to represent data. Converting existing SQL structures to NoSQL structure is a very important task where we should guarantee both better Performance and accurate data. The main objective of this thesis is to highlight the most suitable NoSQL structure to migrate from relational Database in terms of high performance in reading data. Different combinations of NoSQL structures have been tested and compared with SQL structure to be able to conclude the best design to use.For SQL structure, we used the MySQL data that is stored in five tables with different types of relationships among them. For NoSQL, we implemented three different MongoDB structures. We considered combinations of different levels of embedding documents and reference relationships between documents. Our experiments showed that using a mix of one level embedded document with a reference relationship with another document is the best structure to choose. We have used a database that contains five tables with a variety of relationships many-to-one, and many-to-many. Also the huge amount of data stored in all the structures about 2 millions record/document. The research compares clearly between the performances of retrieving data from different MongDB representation of data and the result shows that in some cases using more than one collection to represent huge data with complex relationships is better than keeping all the data in one document

United Arab Emirates University: Scholarworks@UAEU / جامعة الامارات

Data Migration from RDBMS to Hadoop

Author: Nalluri Radhakrishna
Rallabandi Monika
Tiyyagura Naga Sruthi
Publication venue: OPUS Open Portal to University Scholarship
Publication date: 01/04/2016
Field of study

Oracle, IBM, Microsoft and Teradata own a large portion of the information on the planet. By that on the off chance that we run an inquiry in any piece of the world, it is likely that you are perusing the information from a Database possessed by them. The bigger the volume of information moves from Oracle to DB2 or other is testing assignment for the business. The conception of Hadoop and NoSQL innovation spoke to a seismic movement that shook the RDBMS market and offering a different option for organizations. The Database merchants moved rapidly to Big Data for position and opposite. Indeed, even everybody has own enormous information innovation like prophet NoSQL and mongo DB ,There is a colossal business sector for an elite information movement that can duplicate the information and put away in RDBMS Databases to Hadoop or NoSQL databases. Current data is available in the RDBMS databases like oracle, SQL Server, MySQL and Teradata. We are planning to migrate RDBMS data to big data which is support NoSQL database and contains verity of data from the existed system it’s take huge resources and time to migrate pita bytes of data. Time and resource may be constraints for the current migrating process

Governors State University

EasyBDI: integração automática de big data e consultas analíticas de alto nível

Author: Silva Bruno José Pires
Publication venue
Publication date: 02/02/2021
Field of study

Abstract The emergence of new areas, such as the internet of things, which require access to the latest data for data analytics and decision-making environments, created constraints for the execution of analytical queries on traditional data warehouse architectures. In addition, the increase of semi-structure and unstructured data led to the creation of new databases to deal with these types of data, namely, NoSQL databases. This led to the information being stored in several different systems, each with more suitable characteristics for different use cases, which created difficulties in accessing data that are now spread across various systems with different models and characteristics. In this work, a system capable of performing analytical queries in real time on distributed and heterogeneous data sources is proposed: EasyBDI. The system is capable of integrating data logically, without materializing data, creating an overview of the data, thus offering an abstraction over the distribution and heterogeneity of data sources. Queries are executed interactively on data sources, which means that the most recent data will always be used in queries. This system presents a user interface that helps in the configuration of data sources, and automatically proposes a global schema that presents a generic and simplified view of the data, which can be modified by the user. The system allows the creation of multiple star schemas from the global schema. Finally, analytical queries are also made through a user interface that uses drag-and-drop elements. EasyBDI is able to solve recent problems by using recent solutions, hiding the details of several data sources, at the same time that allows users with less knowledge of databases to also be able to perform real-time analytical queries over distributed and heterogeneous data sources.O aparecimento de novas áreas, como a Internet das Coisas, que requerem o acesso aos dados mais recentes para ambientes de tomada de decisão, criou constrangimentos na execução de consultas analíticas usando as arquiteturas tradicionais de data warehouses. Adicionalmente, o aumento de dados semi-estruturados e não estruturados levou a que outras bases de dados fossem criadas para lidar com esse tipo de dados, nomeadamente bases NoSQL. Isto levou a que a informação seja armazenada em sistemas com características distintas e especializados em diferentes casos de uso, criando dificuldades no acesso aos dados que estão agora espalhados por vários sistemas com modelos e características distintas. Neste trabalho, propõe-se um sistema capaz de efetuar consultas analíticas em tempo real sobre fontes de dados distribuídas e heterogéneas: o EasyBDI. O sistema é capaz de integrar dados logicamente, sem materializar os dados, criando uma vista geral dos dados que oferece uma abstração sobre a distribuição e heterogeneidade das fontes de dados. As consultas são executadas interativamente nas fontes de dados, o que significa que os dados mais recentes serão sempre usados nas consultas. Este sistema apresenta uma interface de utilizador que ajuda na configuração de fontes de dados, e propõe automaticamente um esquema global que apresenta a vista genérica e simplificada dos dados, podendo ser modificado pelo utilizador. O sistema permite a criação de múltiplos esquema em estrela a partir do esquema global. Por fim, a realização de consultas analíticas é feita também através de uma interface de utilizador que recorre ao drag-and-drop de elementos. O EasyBDI é capaz de resolver problemas recentes, utilizando também soluções recentes, escondendo os detalhes de diversas fontes de dados, ao mesmo tempo que permite que utilizadores com menos conhecimentos em bases de dados possam também realizar consultas analíticas em tempo-real sobre fontes de dados distribuídas e heterogéneas.Mestrado em Engenharia Informátic

Repositório Institucional da Universidade de Aveiro

Integration and extension of a cloud data migration support tool

Author: Rempel Andreas
Publication venue
Publication date: 01/01/2016
Field of study

Since the growth of Cloud computing, the desire to use this novel computing approach has increased. It is not necessary to redevelop an existing application to target the Cloud and benefit from its advantages. Sometimes, it is even more sensible to migrate existing applications running in a static environment. Since many of those application have strict layers, where not each layer might benefit from an elastic hosting environment, it is sometimes sufficient to migrate only individual layers. To cover as many use cases as possible, broad convertible scenarios, not uncommonly going beyond proprietary approaches, must be offered. Because a migration process is still a pretty complex matter, it is convenient to have a guided conversion to cover all requirements and achieve the desirable result. This bachelor thesis focuses on aspects of migrating data to the Cloud by using a previously developed prototype of a Cloud Data Migration Support Tool. Particularly, the integration of already existing modifications and evaluations of this tool, which were already developed independently, into one stable prototype were required. A further objective of this thesis is to gain platform independence by extending the prototype by a plug-in mechanism to allow the use of native Java DataBase Connectivity (JDBC) drivers for exporting data from existing storage sources and subsequently importing this data into a target data environment, whose types may differ from the types of the source data environment. Furthermore, applying proven concepts on architecture and design are part of this work as well

An introduction to Graph Data Management

Author: A Dries
A Gutiérrez
A Iosup
A Morari
A Poulovassilis
AD Zhu
AO Mendelzon
B Amann
B Elser
C Berge
C Vicknair
C Watters
C Weiss
CS Chang
D Conte
D Dominguez-Sal
D Theodoratos
DC Faye
DW Shipman
EF Codd
FW Tompa
G Malewicz
GM Kuper
H He
HS Kunii
IF Cruz
IF Cruz
J Hidders
J Paredaens
J Peckham
J. Hidders
Jonathan Hayes
K Zeng
L Kowalik
L Zou
M Atre
M Ciglan
M Consens
M Gemis
M Gyssens
M Han
M Levene
M Levene
M Levene
M Mainguenaud
M Schmidt
M Yannakakis
MA Bornea
MA Rodriguez
MA Rodriguez
Marc Andries
MP Consens
MP Consens
N Kiesel
N Roussopoulos
O Erling
P Barceló Baeza
P Buneman
P Yuan
Philippe Cudré-Mauroux
PPS Chen
PT Wood
PT Wood
R Agrawal
R Angles
R Angles
R Brijder
R Ronen
RH Güting
RS Xin
S Abiteboul
S Abiteboul
T Neumann
W Fan
W Kim
Y Guo
Y Low
Y Papakonstantinou
Y Tian
Y Zhao
YA Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/12/2017
Field of study

A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that implement them

arXiv.org e-Print Archive

Crossref

Design, Manipulation and Evolution of Hybrid Polystores

Author: Gobert Maxime
Publication venue
Publication date: 03/03/2023
Field of study

Repository of the University of Namur

Distributed query execution on a replicated and partitioned database

Author: Narula Neha
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2010
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 63-64).Web application developers partition and replicate their data amongst a set of SQL databases to achieve higher throughput. Given multiple copies of tables partioned different ways, developers must manually select different replicas in their application code. This work presents Dixie, a query planner and executor which automatically executes queries over replicas of partitioned data stored in a set of relational databases, and optimizes for high throughput. The challenge in choosing a good query plan lies in predicting query cost, which Dixie does by balancing row retrieval costs with the overhead of contacting many servers to execute a query. For web workloads, per-query overhead in the servers is a large part of the overall cost of execution. Dixie's cost calculation tends to minimize the number of servers used to satisfy a query, which is essential for minimizing this query overhead and obtaining high throughput; this is in direct contrast to optimizers over large data sets that try to maximize parallelism by parallelizing the execution of a query over all the servers. Dixie automatically takes advantage of the addition or removal of replicas without requiring changes in the application code. We show that Dixie sometimes chooses plans that existing parallel database query optimizers might not consider. For certain queries, Dixie chooses a plan that gives a 2.3x improvement in overall system throughput over a plan which does not take into account perserver query overhead costs. Using table replicas, Dixie provides a throughput improvement of 35% over a naive execution without replicas on an artificial workload generated by Pinax, an open source social web site.by Neha Narula.S.M

DSpace@MIT

Recommended from our members

Deterministic, Mutable, and Distributed Record-Replay for Operating Systems and Database Systems

Author: Viennot Nicolas
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2016
Field of study

Application record and replay is the ability to record application execution and replay it at a later time. Record-replay has many use cases including diagnosing and debugging applications by capturing and reproducing hard to find bugs, providing transparent application fault tolerance by maintaining a live replica of a running program, and offline instrumentation that would be too costly to run in a production environment. Different record-replay systems may offer different levels of replay faithfulness, the strongest level being deterministic replay which guarantees an identical reenactment of the original execution. Such a guarantee requires capturing all sources of nondeterminism during the recording phase. In the general case, such record-replay systems can dramatically hinder application performance, rendering them unpractical in certain application domains. Furthermore, various use cases are incompatible with strictly replaying the original execution. For example, in a primary-secondary database scenario, the secondary database would be unable to serve additional traffic while being replicated. No record-replay system fit all use cases. This dissertation shows how to make deterministic record-replay fast and efficient, how broadening replay semantics can enable powerful new use cases, and how choosing the right level of abstraction for record-replay can support distributed and heterogeneous database replication with little effort. We explore four record-replay systems with different semantics enabling different use cases. We first present Scribe, an OS-level deterministic record-replay mechanism that support multi-process applications on multi-core systems. One of the main challenge is to record the interaction of threads running on different CPU cores in an efficient manner. Scribe introduces two new lightweight OS mechanisms, rendezvous point and sync points, to efficiently record nondeterministic interactions such as related system calls, signals, and shared memory accesses. Scribe allows the capture and replication of hard to find bugs to facilitate debugging and serves as a solid foundation for our two following systems. We then present RacePro, a process race detection system to improve software correctness. Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. Detecting process races is difficult due to the elusive nature of these bugs, and the heterogeneity of frameworks involved in such bugs. RacePro is the first tool to detect such process races. RacePro records application executions in deployed systems, allowing offline race detection by analyzing the previously recorded log. RacePro then replays the application execution and forces the manifestation of detected races to check their effect on the application. Upon failure, RacePro reports potentially harmful races to developers. Third, we present Dora, a mutable record-replay system which allows a recorded execution of an application to be replayed with a modified version of the application. Mutable record-replay provides a number of benefits for reproducing, diagnosing, and fixing software bugs. Given a recording and a modified application, finding a mutable replay is challenging, and undecidable in the general case. Despite the difficulty of the problem, we show a very simple but effective algorithm to search for suitable replays. Lastly, we present Synapse, a heterogeneous database replication system designed for Web applications. Web applications are increasingly built using a service-oriented architecture that integrates services powered by a variety of databases. Often, the same data, needed by multiple services, must be replicated across different databases and kept in sync. Unfortunately, these databases use vendor specific data replication engines which are not compatible with each other. To solve this challenge, Synapse operates at the application level to access a unified data representation through object relational mappers. Additionally, Synapse leverages application semantics to replicate data with good consistency semantics using mechanisms similar to Scribe

Columbia University Academic Commons