5 research outputs found

    The Development of a Benchmark Tool for NoSQL Databases

    Get PDF
    The aim of this article is to describe a proposed benchmark methodology and software application targeted at measuring the performance of both SQL and NoSQL databases. These represent the results obtained during PhD research (being actually a part of a larger application intended for NoSQL database management). A reason for aiming at this particular subject is the complete lack of benchmarking tools for NoSQL databases, except for YCBS [1] and a benchmark tool made specifically to compare Redis to RavenDB. While there are several well-known benchmarking systems for classical relational databases (starting with the canon TPC-C, TPC-E and TPC-H), on the other side of databases world such tools are mostly missing and seriously needed

    ПОРІВНЯЛЬНИЙ АНАЛІЗ ВИКОНАННЯ ЗАПИТІВ ДО СЕРВЕРІВ БАЗ ДАНИХ MYSQL І MONGODB

    Get PDF
    У даній роботі проводиться порівняння швидкості виконання запитів до реляційної інереляційної систем управління базами даних (СУБД). Об'єктом дослідження є визначення швидкостівиконання запитів в СУБД MySQL і MongoDB. Предмет дослідження – реляційна СУБД MySQL 5.7.19 інереляційна СУБД MongoDB 4.0.6. Мета дослідження – порівняння реляційної і нереляційної СУБД в контексті часу виконання аналогічних запитів до баз даних з однаковими даними та структурою наодному обладнанні. Для проведення аналізу була використана навчальна реляційна база даних«Employees», яка має відкритий вихідний текст і є доступною для скачування з репозиторію GitHub. Длятестування нереляційної СУБД база даних «Employees» була імпортована в документно-орієнтовану СУБД MongoDB. Реляційні таблиці були перетворені в колекції, рядки – в документи. При цьомуструктура бази даних не була змінена. Як у разі застосування реляційного підходу, так і в разіорганізації нереляційної структури даних у відповідних базах даних індекси не використовувалися.Запити до баз даних виконувалися в відповідних консольних інструментах: для MySQL – Command LineClient, для MongoDB – Mongo Shell. Як інструмент обробки даних в обраних СУБД використовувалисязапити на вибірку даних. Запити для тестування мали аналітичний характер, частина з них містилиагрегатні функції. Вимірювання часу виконання запитів відбувалося при старті сервера СУБД (перший запит до сервера) і на сервері, до якого вже проводилися запити (звичайні умови). Час виконаннязапитів отримано за допомогою стандартних, вбудованих функцій СУБД. В MySQL час виводивсяавтоматично і за допомогою профілювання, в MongoDB – за допомогою функції Explain. Виявлено, щочас виконання запитів в звичайних умовах менший на MySQL, ніж на MongoDB. Деякі запити швидшевиконувалися в MongoDB за умови, що вони були першими до запущеного сервера. Помічено, що часвиконання декількох параметричних запитів в MySQL різний в залежності від параметра запиту. Зпрактичної точки зору, для виконання запитів на вибірку даних краще застосовувати MySQL, ніжMongoDB. Дані висновки поширюються на базу даних "Employees" і на подібні їй за структурою іобсягом даних. In this work queries execution speed for relational and non-relational database management systems (DBMS) was compared. Speed of queries execution for MySQL DBMS and MongoDB DBMS is an object of a research. Relational DBMS MySQL 5.7.19 and non-relational DBMS MongoDB 4.0.6 is a subject of the research. Comparison of a relational DBMS and a non-relational DBMS in a context of execution time of analogical queries to databases with the same data and structure using the same computer is a purpose of the research. For analysis a sample relational database «Employees» which has open source code and available fordownloading from GitHub repository was used. For a non-relational DBMS testing the database «Employees» was imported to a document-oriented DBMS MongoDB. Relational tables were converted to collections, table rows were converted to documents. In addition the database structure was not changed. In both cases, using relational and non-relational approaches, corresponding databases do not have indexes. Databases queries were executed using corresponding console tools: for MySQL it is Command Line Client, for MongoDB – Mongo Shell. In mentioned DBMS’s used data retrieval queries as a tool for data processing. Test queries are analytical, part of them contain aggregate functions. Capturing of time which needed for queries execution were made when DBMS’s were started (the first query to the server) and when DBMS’s were previously queried for several times (normal conditions). Execution time of the queries was captured using standard, built functions of DBMS’s. MySQL showed time automatically and using profiling, MongoDB showed time using Explain function.Was found that execution time of the queries in normal conditions is less in MySQL than in MongoDB. In MongoDB some queries were executed faster in conditions when they were the first queries to the server. Was found that in MySQL execution time of some parametric queries depend from parameters of those queries. From a practical perspective for data retrieval queries it is better to use MySQL than MongoDB. These findings spread to the database «Employees» and databases which have similar structure and data volume.Keywords: database, database management system, relational approach, non-relational approach,NoSQL, MySQL, MongoDB, query execution time, comparison, query

    PERFORMANCE ANALYSIS OF TWO BIG DATA TECHNOLOGIES ON A CLOUD DISTRIBUTED ARCHITECTURE. RESULTS FOR NON-AGGREGATE QUERIES ON MEDIUM-SIZED DATA

    Get PDF
    Big Data systems manage and process huge volumes of data constantly generated by various technologies in a myriad of formats. Big Data advocates (and preachers) have claimed that, relative to classical, relational/SQL Data Base Management Systems, Big Data technologies such as NoSQL, Hadoop and in-memory data stores perform better. This paper compares data processing performance of two systems belonging to SQL (PostgreSQL/Postgres XL) and Big Data (Hadoop/Hive) camps on a distributed five-node cluster deployed in cloud. Unlike benchmarks in use (YCSB, TPC), a series of R modules were devised for generating random non-aggregate queries on different subschema (with increasing data size) of TPC-H database. Overall performance of the two systems was compared. Subsequently a number of models were developed for relating performance on the system and also on various query parameters such as the number of attributes in SELECT and WHERE clause, number of joins, number of processing rows etc.JEL Codes - M1

    An evaluation of the performance of a NoSQL document database in a simulation of a large scale Electronic Health Record (EHR) system

    Get PDF
    Electronic Healthcare Record (EHR) systems can provide significant benefits by improving the effectiveness of healthcare systems. Research and industry projects focusing on storing healthcare information in NoSQL databases has been triggered by practical experience demonstrating that a relational database approach to managing healthcare records has become a bottleneck. Previous studies show that NoSQL databases based on consistency, availability and partition tolerance (CAP) theorem have significant advantages over relational databases such as easy and automatic scaling, better performance and high availability. However, there is limited empirical research that has evaluated the suitability of NoSQL databases for managing EHRs. This research addressed this identified research problem and gap in the literature by investigating the following general research: How can a simulation of a large EHR system be developed so that the performance of NoSQL document databases comparative to relational databases can be evaluated? Using a Design Science approach informed by a pragmatic worldview, a number of IT artefacts were developed to enable an evaluation of performance of a NoSQL document oriented database comparative to a relational database in a simulation of a large scale EHR system. These were healthcare data models (NoSQL document database, relational database) for the Australian Healthcare context, a random healthcare data generator and a prototype EHR system. The performance of a NoSQL document database (Couchbase) was evaluated comparative to a relational database (MySQL) in terms database operations (insert, update, delete of EHRs), scalability, EHR sharing and data analysis (complex querying) capabilities in a simulation of a large scale EHR system, constructed in the cloud environment of Amazon Web Services (AWS). Test scenarios consisted of a number of different configurations ranging from 1, 2, 4, 8 and 16 nodes for 1Million, 10 Million, 100 Million and 500 Million records to simulate database operations in a large scale and distributed EHR system environment. The Couchbase NoSQL document database was found to perform significantly better than the MySQL relational database in most of the test cases in terms of database operations -insert, update, delete of EHRs, scalability and EHR sharing. However, the MySQL relational database was found to perform significantly better than the Couchbase NoSQL document database for the complex query test that demonstrates basic analysis capabilities. Furthermore, the Couchbase NoSQL document database used significantly more disk space than the MySQL relational database to store the same number of EHRs. This research made a number of important contributions to knowledge, theory and practice. The main theoretical contribution to design theory was the design and evaluation of a prototype EHR system for simulating database management operations in a large scale EHR system environment. The prototype EHR system was underpinned by the development of two data models with data structures designed for a NoSQL document database and a relational database and a random healthcare data generator which were based on Australian Healthcare data characteristics and statistics. The design of a data model for EHRs for a NoSQL document database using an aggregated document modelling approach provided an important contribution to data modelling theory for NoSQL document databases using de-normalisation and document aggregation. The design of a random healthcare data generator was another important contribution to design theory and was based on a data distribution algorithm (multinomial distribution and probability theory) informed by National Health Data Dictionary and published Australian Healthcare statistics. The prototype EHR system allowed this study to demonstrate through a simulated performance evaluation that a NoSQL document database has significant and proven performance advantages over relational databases in most of the database management test cases. Hence this study demonstrated the utility and efficacy of a NoSQL document database in the simulation of a large scale EHR system. This research has made a number of important contributions to practice. Foremost is that the IT artefacts (namely, a data model for storing EHRs in a NoSQL document database, a random healthcare data generator and a prototype EHR system) developed and evaluated in this research can be readily adopted by practitioners. Another important practical contribution of this research is that it is based on the open source availability of NoSQL database and relational database alternatives. Hence, this research can provide a sound basis for lower-income countries as well higher-income countries to establish their own cost-effective national EHR systems without the restrictions, limitations, complexity or complications of similar proprietary relational database systems
    corecore