51 research outputs found


    Get PDF
    Currently two major database management systems are in use for dealing with data, the Relational Database Management System (RDBMS) also knows as standard SQL databases and the NoSQL databases. The RDBMS databases deal with structured data and the NoSQL databases with unstructured or semi-structured data. The RDBMS databases have been popular for many years but the NoSQL type is gaining popularity with the introduction of the internet and social media. Data flow from SQL to NoSQL or vice versa is very much possible in the near future due to the growing popularity of the NoSQL databases. The goal of this thesis is to analyze the data structures of the RDBMS and the NoSQL databases and to suggest a Graphical User Interface (GUI) tool that migrates the data from SQL to NoSQL databases. The relational databases have been in use and have dominated the industry for many years. In contrast, the NoSQL databases were introduced with the increased usage of the internet, social media, and cloud computing. The traditional relational databases guarantee data integrity whereas high availability and scalability are the main advantages of the NoSQL databases. This thesis presents a comparison of these two technologies. It compares the data structure and data storing techniques of the two technologies. The SQL databases store data differently as compared to the NoSQL databases due to their specific demands. The data stored in the relational databases is highly structured and normalized in most environments whereas the data in the NoSQL databases are mostly unstructured. This difference of the data structure helps in meeting the specific demands of these two systems. The NoSQL DBs are scalable with high availability due to the simpler data model but does not guarantee data consistency at all times. On the other hand the RDBMS systems are not easily scalable and available at the same time due to the complex data model but guarantees data consistency. This thesis uses CouchDB and MySQL to represent the NoSQL and standard SQL databases respectively. The aim of the iii research in this document is to suggest a methodology for data migration from the RDBMS databases to the document-based NoSQL databases. Data migration between the RDBMS and the NoSQL systems is anticipated because both systems are currently in use by many industry leaders. This thesis presents a Graphical User Interface as a starting point that enables the data migration from the RDBMS to the NoSQL databases. MySQL and CouchDB are used as the test databases for the relational and NoSQL systems respectively. This thesis presents an architecture and methodology to achieve this objective

    Review of performance of various Big Databases

    Get PDF
    Relational databases have been the main model for information data storage, retrieval and administration.A relational database is a table-based data system where there is no scalability, insignificant information duplication, computationally costly table joins and trouble in managing complex information. The greatest inspiration of NoSQL is adaptability. NoSQL information stores are broadly used to store and recover potentially a lot of information.In this paper, we assess four most famous NoSQL databases: Cassandra, MongoDB, and CouchDB

    Migrating From SQL to NoSQL Database: Practices and Analysis

    Get PDF
    Most of the enterprises that are dealing with big data are moving towards using NoSQL data structures to represent data. Converting existing SQL structures to NoSQL structure is a very important task where we should guarantee both better Performance and accurate data. The main objective of this thesis is to highlight the most suitable NoSQL structure to migrate from relational Database in terms of high performance in reading data. Different combinations of NoSQL structures have been tested and compared with SQL structure to be able to conclude the best design to use.For SQL structure, we used the MySQL data that is stored in five tables with different types of relationships among them. For NoSQL, we implemented three different MongoDB structures. We considered combinations of different levels of embedding documents and reference relationships between documents. Our experiments showed that using a mix of one level embedded document with a reference relationship with another document is the best structure to choose. We have used a database that contains five tables with a variety of relationships many-to-one, and many-to-many. Also the huge amount of data stored in all the structures about 2 millions record/document. The research compares clearly between the performances of retrieving data from different MongDB representation of data and the result shows that in some cases using more than one collection to represent huge data with complex relationships is better than keeping all the data in one document

    Data Migration from RDBMS to Hadoop

    Get PDF
    Oracle, IBM, Microsoft and Teradata own a large portion of the information on the planet. By that on the off chance that we run an inquiry in any piece of the world, it is likely that you are perusing the information from a Database possessed by them. The bigger the volume of information moves from Oracle to DB2 or other is testing assignment for the business. The conception of Hadoop and NoSQL innovation spoke to a seismic movement that shook the RDBMS market and offering a different option for organizations. The Database merchants moved rapidly to Big Data for position and opposite. Indeed, even everybody has own enormous information innovation like prophet NoSQL and mongo DB ,There is a colossal business sector for an elite information movement that can duplicate the information and put away in RDBMS Databases to Hadoop or NoSQL databases. Current data is available in the RDBMS databases like oracle, SQL Server, MySQL and Teradata. We are planning to migrate RDBMS data to big data which is support NoSQL database and contains verity of data from the existed system it’s take huge resources and time to migrate pita bytes of data. Time and resource may be constraints for the current migrating process

    EasyBDI: integração automática de big data e consultas analíticas de alto nível

    Get PDF
    Abstract The emergence of new areas, such as the internet of things, which require access to the latest data for data analytics and decision-making environments, created constraints for the execution of analytical queries on traditional data warehouse architectures. In addition, the increase of semi-structure and unstructured data led to the creation of new databases to deal with these types of data, namely, NoSQL databases. This led to the information being stored in several different systems, each with more suitable characteristics for different use cases, which created difficulties in accessing data that are now spread across various systems with different models and characteristics. In this work, a system capable of performing analytical queries in real time on distributed and heterogeneous data sources is proposed: EasyBDI. The system is capable of integrating data logically, without materializing data, creating an overview of the data, thus offering an abstraction over the distribution and heterogeneity of data sources. Queries are executed interactively on data sources, which means that the most recent data will always be used in queries. This system presents a user interface that helps in the configuration of data sources, and automatically proposes a global schema that presents a generic and simplified view of the data, which can be modified by the user. The system allows the creation of multiple star schemas from the global schema. Finally, analytical queries are also made through a user interface that uses drag-and-drop elements. EasyBDI is able to solve recent problems by using recent solutions, hiding the details of several data sources, at the same time that allows users with less knowledge of databases to also be able to perform real-time analytical queries over distributed and heterogeneous data sources.O aparecimento de novas áreas, como a Internet das Coisas, que requerem o acesso aos dados mais recentes para ambientes de tomada de decisão, criou constrangimentos na execução de consultas analíticas usando as arquiteturas tradicionais de data warehouses. Adicionalmente, o aumento de dados semi-estruturados e não estruturados levou a que outras bases de dados fossem criadas para lidar com esse tipo de dados, nomeadamente bases NoSQL. Isto levou a que a informação seja armazenada em sistemas com características distintas e especializados em diferentes casos de uso, criando dificuldades no acesso aos dados que estão agora espalhados por vários sistemas com modelos e características distintas. Neste trabalho, propõe-se um sistema capaz de efetuar consultas analíticas em tempo real sobre fontes de dados distribuídas e heterogéneas: o EasyBDI. O sistema é capaz de integrar dados logicamente, sem materializar os dados, criando uma vista geral dos dados que oferece uma abstração sobre a distribuição e heterogeneidade das fontes de dados. As consultas são executadas interativamente nas fontes de dados, o que significa que os dados mais recentes serão sempre usados nas consultas. Este sistema apresenta uma interface de utilizador que ajuda na configuração de fontes de dados, e propõe automaticamente um esquema global que apresenta a vista genérica e simplificada dos dados, podendo ser modificado pelo utilizador. O sistema permite a criação de múltiplos esquema em estrela a partir do esquema global. Por fim, a realização de consultas analíticas é feita também através de uma interface de utilizador que recorre ao drag-and-drop de elementos. O EasyBDI é capaz de resolver problemas recentes, utilizando também soluções recentes, escondendo os detalhes de diversas fontes de dados, ao mesmo tempo que permite que utilizadores com menos conhecimentos em bases de dados possam também realizar consultas analíticas em tempo-real sobre fontes de dados distribuídas e heterogéneas.Mestrado em Engenharia Informátic

    Integration and extension of a cloud data migration support tool

    Get PDF
    Since the growth of Cloud computing, the desire to use this novel computing approach has increased. It is not necessary to redevelop an existing application to target the Cloud and benefit from its advantages. Sometimes, it is even more sensible to migrate existing applications running in a static environment. Since many of those application have strict layers, where not each layer might benefit from an elastic hosting environment, it is sometimes sufficient to migrate only individual layers. To cover as many use cases as possible, broad convertible scenarios, not uncommonly going beyond proprietary approaches, must be offered. Because a migration process is still a pretty complex matter, it is convenient to have a guided conversion to cover all requirements and achieve the desirable result. This bachelor thesis focuses on aspects of migrating data to the Cloud by using a previously developed prototype of a Cloud Data Migration Support Tool. Particularly, the integration of already existing modifications and evaluations of this tool, which were already developed independently, into one stable prototype were required. A further objective of this thesis is to gain platform independence by extending the prototype by a plug-in mechanism to allow the use of native Java DataBase Connectivity (JDBC) drivers for exporting data from existing storage sources and subsequently importing this data into a target data environment, whose types may differ from the types of the source data environment. Furthermore, applying proven concepts on architecture and design are part of this work as well

    Distributed query execution on a replicated and partitioned database

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 63-64).Web application developers partition and replicate their data amongst a set of SQL databases to achieve higher throughput. Given multiple copies of tables partioned different ways, developers must manually select different replicas in their application code. This work presents Dixie, a query planner and executor which automatically executes queries over replicas of partitioned data stored in a set of relational databases, and optimizes for high throughput. The challenge in choosing a good query plan lies in predicting query cost, which Dixie does by balancing row retrieval costs with the overhead of contacting many servers to execute a query. For web workloads, per-query overhead in the servers is a large part of the overall cost of execution. Dixie's cost calculation tends to minimize the number of servers used to satisfy a query, which is essential for minimizing this query overhead and obtaining high throughput; this is in direct contrast to optimizers over large data sets that try to maximize parallelism by parallelizing the execution of a query over all the servers. Dixie automatically takes advantage of the addition or removal of replicas without requiring changes in the application code. We show that Dixie sometimes chooses plans that existing parallel database query optimizers might not consider. For certain queries, Dixie chooses a plan that gives a 2.3x improvement in overall system throughput over a plan which does not take into account perserver query overhead costs. Using table replicas, Dixie provides a throughput improvement of 35% over a naive execution without replicas on an artificial workload generated by Pinax, an open source social web site.by Neha Narula.S.M