51 research outputs found
DATA MIGRATION FROM STANDARD SQL TO NoSQL
Currently two major database management systems are in use for dealing with data, the Relational Database Management System (RDBMS) also knows as standard SQL databases and the NoSQL databases. The RDBMS databases deal with structured data and the NoSQL databases with unstructured or semi-structured data. The RDBMS databases have been popular for many years but the NoSQL type is gaining popularity with the introduction of the internet and social media. Data flow from SQL to NoSQL or vice versa is very much possible in the near future due to the growing popularity of the NoSQL databases. The goal of this thesis is to analyze the data structures of the RDBMS and the NoSQL databases and to suggest a Graphical User Interface (GUI) tool that migrates the data from SQL to NoSQL databases. The relational databases have been in use and have dominated the industry for many years. In contrast, the NoSQL databases were introduced with the increased usage of the internet, social media, and cloud computing. The traditional relational databases guarantee data integrity whereas high availability and scalability are the main advantages of the NoSQL databases. This thesis presents a comparison of these two technologies. It compares the data structure and data storing techniques of the two technologies. The SQL databases store data differently as compared to the NoSQL databases due to their specific demands. The data stored in the relational databases is highly structured and normalized in most environments whereas the data in the NoSQL databases are mostly unstructured. This difference of the data structure helps in meeting the specific demands of these two systems. The NoSQL DBs are scalable with high availability due to the simpler data model but does not guarantee data consistency at all times. On the other hand the RDBMS systems are not easily scalable and available at the same time due to the complex data model but guarantees data consistency. This thesis uses CouchDB and MySQL to represent the NoSQL and standard SQL databases respectively. The aim of the iii research in this document is to suggest a methodology for data migration from the RDBMS databases to the document-based NoSQL databases. Data migration between the RDBMS and the NoSQL systems is anticipated because both systems are currently in use by many industry leaders. This thesis presents a Graphical User Interface as a starting point that enables the data migration from the RDBMS to the NoSQL databases. MySQL and CouchDB are used as the test databases for the relational and NoSQL systems respectively. This thesis presents an architecture and methodology to achieve this objective
Review of performance of various Big Databases
Relational databases have been the main model for information data storage, retrieval and administration.A relational database is a table-based data system where there is no scalability, insignificant information duplication, computationally costly table joins and trouble in managing complex information. The greatest inspiration of NoSQL is adaptability. NoSQL information stores are broadly used to store and recover potentially a lot of information.In this paper, we assess four most famous NoSQL databases: Cassandra, MongoDB, and CouchDB
Migrating From SQL to NoSQL Database: Practices and Analysis
Most of the enterprises that are dealing with big data are moving towards using
NoSQL data structures to represent data. Converting existing SQL structures to
NoSQL structure is a very important task where we should guarantee both better
Performance and accurate data. The main objective of this thesis is to highlight the
most suitable NoSQL structure to migrate from relational Database in terms of high
performance in reading data. Different combinations of NoSQL structures have been tested and compared with SQL structure to be able to conclude the best design to use.For SQL structure, we used the MySQL data that is stored in five tables with different types of relationships among them. For NoSQL, we implemented three different MongoDB structures. We considered combinations of different levels of embedding documents and reference relationships between documents.
Our experiments showed that using a mix of one level embedded document with a
reference relationship with another document is the best structure to choose. We have used a database that contains five tables with a variety of relationships many-to-one, and many-to-many. Also the huge amount of data stored in all the structures about 2 millions record/document. The research compares clearly between the performances of retrieving data from different MongDB representation of data and the result shows that in some cases using more than one collection to represent huge data with complex relationships is better than keeping all the data in one document
Data Migration from RDBMS to Hadoop
Oracle, IBM, Microsoft and Teradata own a large portion of the information on the planet. By that on the off chance that we run an inquiry in any piece of the world, it is likely that you are perusing the information from a Database possessed by them. The bigger the volume of information moves from Oracle to DB2 or other is testing assignment for the business. The conception of Hadoop and NoSQL innovation spoke to a seismic movement that shook the RDBMS market and offering a different option for organizations. The Database merchants moved rapidly to Big Data for position and opposite. Indeed, even everybody has own enormous information innovation like prophet NoSQL and mongo DB ,There is a colossal business sector for an elite information movement that can duplicate the information and put away in RDBMS Databases to Hadoop or NoSQL databases. Current data is available in the RDBMS databases like oracle, SQL Server, MySQL and Teradata. We are planning to migrate RDBMS data to big data which is support NoSQL database and contains verity of data from the existed system it’s take huge resources and time to migrate pita bytes of data. Time and resource may be constraints for the current migrating process
EasyBDI: integração automática de big data e consultas analíticas de alto nível
Abstract The emergence of new areas, such as the internet of things, which require access
to the latest data for data analytics and decision-making environments,
created constraints for the execution of analytical queries on traditional data
warehouse architectures.
In addition, the increase of semi-structure and unstructured data led to the
creation of new databases to deal with these types of data, namely, NoSQL
databases. This led to the information being stored in several different systems,
each with more suitable characteristics for different use cases, which
created difficulties in accessing data that are now spread across various systems
with different models and characteristics.
In this work, a system capable of performing analytical queries in real time
on distributed and heterogeneous data sources is proposed: EasyBDI. The
system is capable of integrating data logically, without materializing data,
creating an overview of the data, thus offering an abstraction over the distribution
and heterogeneity of data sources. Queries are executed interactively
on data sources, which means that the most recent data will always be used
in queries. This system presents a user interface that helps in the configuration
of data sources, and automatically proposes a global schema that
presents a generic and simplified view of the data, which can be modified
by the user. The system allows the creation of multiple star schemas from
the global schema. Finally, analytical queries are also made through a user
interface that uses drag-and-drop elements.
EasyBDI is able to solve recent problems by using recent solutions, hiding
the details of several data sources, at the same time that allows users with
less knowledge of databases to also be able to perform real-time analytical
queries over distributed and heterogeneous data sources.O aparecimento de novas áreas, como a Internet das Coisas, que requerem o
acesso aos dados mais recentes para ambientes de tomada de decisão, criou
constrangimentos na execução de consultas analíticas usando as arquiteturas
tradicionais de data warehouses.
Adicionalmente, o aumento de dados semi-estruturados e não estruturados
levou a que outras bases de dados fossem criadas para lidar com esse tipo
de dados, nomeadamente bases NoSQL. Isto levou a que a informação seja
armazenada em sistemas com características distintas e especializados em
diferentes casos de uso, criando dificuldades no acesso aos dados que estão
agora espalhados por vários sistemas com modelos e características distintas.
Neste trabalho, propõe-se um sistema capaz de efetuar consultas analíticas
em tempo real sobre fontes de dados distribuídas e heterogéneas: o EasyBDI.
O sistema é capaz de integrar dados logicamente, sem materializar os dados,
criando uma vista geral dos dados que oferece uma abstração sobre a
distribuição e heterogeneidade das fontes de dados. As consultas são executadas
interativamente nas fontes de dados, o que significa que os dados
mais recentes serão sempre usados nas consultas. Este sistema apresenta
uma interface de utilizador que ajuda na configuração de fontes de dados, e
propõe automaticamente um esquema global que apresenta a vista genérica
e simplificada dos dados, podendo ser modificado pelo utilizador. O sistema
permite a criação de múltiplos esquema em estrela a partir do esquema
global. Por fim, a realização de consultas analíticas é feita também através
de uma interface de utilizador que recorre ao drag-and-drop de elementos.
O EasyBDI é capaz de resolver problemas recentes, utilizando também
soluções recentes, escondendo os detalhes de diversas fontes de dados, ao
mesmo tempo que permite que utilizadores com menos conhecimentos em
bases de dados possam também realizar consultas analíticas em tempo-real
sobre fontes de dados distribuídas e heterogéneas.Mestrado em Engenharia Informátic
Integration and extension of a cloud data migration support tool
Since the growth of Cloud computing, the desire to use this novel computing approach has increased. It is not necessary to redevelop an existing application to target the Cloud and benefit from its advantages. Sometimes, it is even more sensible to migrate existing applications running in a static environment. Since many of those application have strict layers, where not each layer might benefit from an elastic hosting environment, it is sometimes sufficient to migrate only individual layers. To cover as many use cases as possible, broad convertible scenarios, not uncommonly going beyond proprietary approaches, must be offered. Because a migration process is still a pretty complex matter, it is convenient to have a guided conversion to cover all requirements and achieve the desirable result.
This bachelor thesis focuses on aspects of migrating data to the Cloud by using a previously developed prototype of a Cloud Data Migration Support Tool. Particularly, the integration of already existing modifications and evaluations of this tool, which were already developed independently, into one stable prototype were required. A further objective of this thesis is to gain platform independence by extending the prototype by a plug-in mechanism to allow the use of native Java DataBase Connectivity (JDBC) drivers for exporting data from existing storage sources and subsequently importing this data into a target data environment, whose types may differ from the types of the source data environment. Furthermore, applying proven concepts on architecture and design are part of this work as well
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
Distributed query execution on a replicated and partitioned database
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 63-64).Web application developers partition and replicate their data amongst a set of SQL databases to achieve higher throughput. Given multiple copies of tables partioned different ways, developers must manually select different replicas in their application code. This work presents Dixie, a query planner and executor which automatically executes queries over replicas of partitioned data stored in a set of relational databases, and optimizes for high throughput. The challenge in choosing a good query plan lies in predicting query cost, which Dixie does by balancing row retrieval costs with the overhead of contacting many servers to execute a query. For web workloads, per-query overhead in the servers is a large part of the overall cost of execution. Dixie's cost calculation tends to minimize the number of servers used to satisfy a query, which is essential for minimizing this query overhead and obtaining high throughput; this is in direct contrast to optimizers over large data sets that try to maximize parallelism by parallelizing the execution of a query over all the servers. Dixie automatically takes advantage of the addition or removal of replicas without requiring changes in the application code. We show that Dixie sometimes chooses plans that existing parallel database query optimizers might not consider. For certain queries, Dixie chooses a plan that gives a 2.3x improvement in overall system throughput over a plan which does not take into account perserver query overhead costs. Using table replicas, Dixie provides a throughput improvement of 35% over a naive execution without replicas on an artificial workload generated by Pinax, an open source social web site.by Neha Narula.S.M
Recommended from our members
Deterministic, Mutable, and Distributed Record-Replay for Operating Systems and Database Systems
Application record and replay is the ability to record application execution and replay it at a later time. Record-replay has many use cases including diagnosing and debugging applications by capturing and reproducing hard to find bugs, providing transparent application fault tolerance by maintaining a live replica of a running program, and offline instrumentation that would be too costly to run in a production environment. Different record-replay systems may offer different levels of replay faithfulness, the strongest level being deterministic replay which guarantees an identical reenactment of the original execution. Such a guarantee requires capturing all sources of nondeterminism during the recording phase. In the general case, such record-replay systems can dramatically hinder application performance, rendering them unpractical in certain application domains. Furthermore, various use cases are incompatible with strictly replaying the original execution. For example, in a primary-secondary database scenario, the secondary database would be unable to serve additional traffic while being replicated. No record-replay system fit all use cases.
This dissertation shows how to make deterministic record-replay fast and efficient, how broadening replay semantics can enable powerful new use cases, and how choosing the right level of abstraction for record-replay can support distributed and heterogeneous database replication with little effort.
We explore four record-replay systems with different semantics enabling different use cases. We first present Scribe, an OS-level deterministic record-replay mechanism that support multi-process applications on multi-core systems. One of the main challenge is to record the interaction of threads running on different CPU cores in an efficient manner. Scribe introduces two new lightweight OS mechanisms, rendezvous point and sync points, to efficiently record nondeterministic interactions such as related system calls, signals, and shared memory accesses. Scribe allows the capture and replication of hard to find bugs to facilitate debugging and serves as a solid foundation for our two following systems.
We then present RacePro, a process race detection system to improve software correctness. Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. Detecting process races is difficult due to the elusive nature of these bugs, and the heterogeneity of frameworks involved in such bugs. RacePro is the first tool to detect such process races. RacePro records application executions in deployed systems, allowing offline race detection by analyzing the previously recorded log. RacePro then replays the application execution and forces the manifestation of detected races to check their effect on the application. Upon failure, RacePro reports potentially harmful races to developers.
Third, we present Dora, a mutable record-replay system which allows a recorded execution of an application to be replayed with a modified version of the application. Mutable record-replay provides a number of benefits for reproducing, diagnosing, and fixing software bugs. Given a recording and a modified application, finding a mutable replay is challenging, and undecidable in the general case. Despite the difficulty of the problem, we show a very simple but effective algorithm to search for suitable replays.
Lastly, we present Synapse, a heterogeneous database replication system designed for Web applications. Web applications are increasingly built using a service-oriented architecture that integrates services powered by a variety of databases. Often, the same data, needed by multiple services, must be replicated across different databases and kept in sync. Unfortunately, these databases use vendor specific data replication engines which are not compatible with each other. To solve this challenge, Synapse operates at the application level to access a unified data representation through object relational mappers. Additionally, Synapse leverages application semantics to replicate data with good consistency semantics using mechanisms similar to Scribe
- …