24 research outputs found

    Managing polyglot systems metadata with hypergraphs

    Get PDF
    A single type of data store can hardly fulfill every end-user requirements in the NoSQL world. Therefore, polyglot systems use different types of NoSQL datastores in combination. However, the heterogeneity of the data storage models makes managing the metadata a complex task in such systems, with only a handful of research carried out to address this. In this paper, we propose a hypergraph-based approach for representing the catalog of metadata in a polyglot system. Taking an existing common programming interface to NoSQL systems, we extend and formalize it as hypergraphs for managing metadata. Then, we define design constraints and query transformation rules for three representative data store types. Furthermore, we propose a simple query rewriting algorithm using the catalog itself for these data store types and provide a prototype implementation. Finally, we show the feasibility of our approach on a use case of an existing polyglot system.Peer ReviewedPostprint (author's final draft

    New Perspectives for NoSQL Database Design: A Systematic Review

    Get PDF
    The use of NoSQL databases has increasingly become a trend in software development, mainly due to the expansion of Web 2.0 systems. However, there is not yet a standard to be used for the design of this type of database even with the growing number of studies related to this subject. This paper presents a systematic review looking for new trends regarding strategies used in this context. The result of this process demonstrates that there are still few methodologies for the NoSQL database design and there are no design methodologies capable of working with polyglot persistence

    From NOSQL databases to decision support systems: developing a business intelligence solution

    Get PDF
    We are living a time in which the data generated by humans and machines has reached levels never seen before the so-called era of Big Data. Everyday, vast amounts of data, coming from different sources and with different formats, are created and made available to organizations. First, with the rise of the social networks and, more recently, with the advent of the Internet of Things (IoT), data with enormous potential for organizations is being continuously generated. In order to be more competitive, organizations need to explore all the richness that is present in those data. Indeed, data is only as valuable as the insights organizations gather from it to make better decisions, which is the main goal of Business Intelligence (BI). In this paper, we describe the development of a decision support system in which data obtained from a NoSQL database is used to feed a BI solution.This work has been supported by FCT - Fundação para a Ciência e Tecnologia, within the Project Scope: UID/CEC/00319/2019

    Are NoSQL Data Stores Useful for Bioinformatics Researchers?

    Get PDF
    The big data challenge in bioinformatics is approaching. Data storage and processing, instead of experimental technologies, are becoming the slower and more costly part of research. Biological data typically have large size and a variety of structures. The ability to efficiently store and retrieve the data is important in bioinformatics research. Traditionally, large datasets are either stored as disk-based flat-files or in relational databases. These systems become more complicated to plan, maintain and adjust to big data applications as they follow rigid table schema and often lack scalability, e.g. for data aggregation. Meanwhile, non-relational databases (NoSQL) emerge to provide alternative, flexible and more scalable data stores. In this study, we aim to quantitatively compare the latencies of different data stores on storing and querying proteomics datasets. We show benchmarks for typical relational and non-relational systems for both, in-memory and disk-based configurations and compare them to a simple flat-file based approach. We will focus on the latencies of storing and querying proteomics mass spectrometry datasets and the actual space consumption inside the data stores. Experiments are carried out on a local desktop with medium-sized data, which is the typical experimental settings of individual bioinformatics researchers. Results show that there are significant latency differences among the considered data stores (up to 30 folds). In certain use cases, flat file system can achieve comparable performance with the data stores. DOI: 10.17762/ijritcc2321-8169.150317

    Column-based databases: estudo exploratório no âmbito das bases de dados NoSQL

    Get PDF
    O aumento da quantidade de dados gerados que se tem verificado nos últimos anos e a que se tem vindo a dar o nome de Big Data levou a que a tecnologia relacional começasse a demonstrar algumas fragilidades no seu armazenamento e manuseamento o que levou ao aparecimento das bases de dados NoSQL. Estas estão divididas por quatro tipos distintos nomeadamente chave/valor, documentos, grafos e famílias de colunas. Este artigo é focado nas bases de dados do tipo column-based e nele serão analisados os dois sistemas deste tipo considerados mais relevantes: Cassandra e HBase

    PDDM: A Database Design Method for Polyglot Persistence

    Get PDF
    Databases by Web 2.0 has revealed the limitations of the relational model related to scalability. This led to the emergence of NoSQL databases, with data storage models other than relational ones. These databases propose solutions to such limitations through horizontal scalability and partially compromise data consistency. The combination of multiple data models, called polyglot persistence, extends these solutions by providing resources for the implementation of complex systems that have components with distinct requirements that would not be possible by the use of only one data model in a satisfactory way. However, there are no consolidated methods for the NoSQL database design and neither methods for design systems that apply the polyglot persistence. This work proposes a database design method applied to systems that use polyglot persistence, combining different data models. This method can be applied to the relational model and aggregate-oriented NoSQL data models. The method defines a set of sub-steps based on the existing concepts of database design. The goal is to define a formal process to assist in defining the data models to be used and to transform the conceptual design into a logical design. The method application is demonstrated in some test cases, in order to show its results and applicability for later execution of the physical design of these databases

    Метод создания коллекций со вложенными документами для баз данных типа ключ-документ с учетом выполняемых запросов

    Get PDF
    In the recent decades, NoSQL databases have become more popular day by day. And increasingly, developers and database administrators, for whatever reason, have to solve the problems of database migration from a relational model in the model NoSQL databases like the document-oriented database MongoDB database. This article discusses the approach to this migration data based on set theory. A new formal method of determining the optimal runtime searches aggregate collections with the attached documents NoSQL databases such as the key document. The attributes of the database objects are included in optimizing the number of collections and their structures in search queries. The initial data are object properties (attributes, relationships between attributes) on which information is stored in the database, and query the properties that are most often performed, or the speed of which should be maximal. This article discusses the basic types of connections (1-1, 1-M, M-M), typical of the relational model. The proposed method is the following step of the method of creating a collection without embedded documents. The article also provides a method for determining what methods should be used in the reasonable cases to make work with databases more effectively. At the end, this article shows the results of testing of the proposed method on databases with different initial schemes. Experimental results show that the proposed method helps reduce the execution time of queries can also significantly as well as reduce the amount of memory required to store the data in a new database.В последние десятилетия все большую популярность набирают NoSQL базы данных, и все чаще разработчикам и администраторам таких баз по той или иной причине приходится решать задачу миграции баз данных из реляционной модели в модель NoSQL, например документно-ориентированную базу данных MongoDB. Описывается подход к такой миграции данных на основе теории множеств. Предлагаются правила для определения совокупности коллекций со вложенными документами NoSQL базы данных типа ключ-документ, оптимальной по времени выполнения поисковых запросов. Оптимизация числа коллекций и их структуры проводится с учетом атрибутов объектов базы данных, участвующих в поисковых запросах. Исходными данными являются свойства объектов (атрибуты, связи между атрибутами), информация о которых хранится в базе данных, и свойства запросов, которые наиболее часто выполняются или скорость их выполнения максимальна. В правилах учитываются основные типы связей (1-1, 1-М, М-М), свойственные реляционной модели. Рассматриваемая совокупность правил является дополнением к методу создания коллекций без вложенных документов. Также приводится методика для определения, в каких случаях какие методы надо использовать, чтобы сделать работу с базами данных более эффективной. В заключении приведены результаты тестирования предлагаемого метода на базах данных с различными начальными схемами. Результаты экспериментов показывают, что предлагаемый метод помимо сокращения времени выполнения запросов позволяет также значительно сократить объем памяти, необходимый для хранения данных в новой базе данных

    Метод создания коллекций со вложенными документами для баз данных типа ключ-документ с учетом выполняемых запросов

    Get PDF
    В последние десятилетия все большую популярность набирают NoSQL базы данных, и все чаще разработчикам и администраторам таких баз по той или иной причине приходится решать задачу миграции баз данных из реляционной модели в модель NoSQL, например документно-ориентированную базу данных MongoDB. Описывается подход к такой миграции данных на основе теории множеств. Предлагаются правила для определения совокупности коллекций со вложенными документами NoSQL базы данных типа ключ-документ, оптимальной по времени выполнения поисковых запросов. Оптимизация числа коллекций и их структуры проводится с учетом атрибутов объектов базы данных, участвующих в поисковых запросах. Исходными данными являются свойства объектов (атрибуты, связи между атрибутами), информация о которых хранится в базе данных, и свойства запросов, которые наиболее часто выполняются или скорость их выполнения максимальна. В правилах учитываются основные типы связей (1-1, 1-М, М-М), свойственные реляционной модели. Рассматриваемая совокупность правил является дополнением к методу создания коллекций без вложенных документов. Также приводится методика для определения, в каких случаях какие методы надо использовать, чтобы сделать работу с базами данных более эффективной. В заключении приведены результаты тестирования предлагаемого метода на базах данных с различными начальными схемами. Результаты экспериментов показывают, что предлагаемый метод помимо сокращения времени выполнения запросов позволяет также значительно сократить объем памяти, необходимый для хранения данных в новой базе данных

    Geospatial Anarchy: Managing datasets the Open Source way

    Get PDF
    OpenStreetMap (OSM) is the largest and best-known example of geospatial data creation using Volunteered Geographic Information (VGI). A large group of non-specialists joins their efforts online to create an open, worldwide map of the world. The project differs from traditional management of geospatial data on several accounts: both the underlying technology (Open Source components) and the mindset (schema-less structures using tags and changesets). We review how traditional organizations are currently using the OSM technology to meet their needs and how the mindset of OSM could be employed to traditional management of spatial datasets as well

    Toward Scalable Hybrid Stores

    Get PDF
    National audienceData centric applications often use heterogeneous datasets: some very large while others of moderate size, some highly structured (e.g., relations) while others complex structured (e.g., graphs) or little structured (e.g., log data). Facing them is a variety of storage systems but none of which is the best for all, at all times.We present Estocada, an architecture we are currently developing to efficiently handle highly heterogeneous datasets based on a dynamic set of potentially very different data stores. Estocada provides to the ap- plication layer access to each dataset in its native format, while hosting them internally in a set of potentially overlapping fragments, possibly distributed across heterogeneous stores. At the core of Estocada lie powerful view-based rewriting and view selection algorithms to marry correctness with high performance
    corecore