Search CORE

24 research outputs found

Managing polyglot systems metadata with hypergraphs

Author: F. Saltor
Jennie Duggan
Jérôme Euzenat
Lanjun Wang
Manos Karpathiotakis
P. Shvaiko
Paolo Atzeni
Ying-Ti Liao
Ágnes Vathy-Fogarassy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

A single type of data store can hardly fulfill every end-user requirements in the NoSQL world. Therefore, polyglot systems use different types of NoSQL datastores in combination. However, the heterogeneity of the data storage models makes managing the metadata a complex task in such systems, with only a handful of research carried out to address this. In this paper, we propose a hypergraph-based approach for representing the catalog of metadata in a polyglot system. Taking an existing common programming interface to NoSQL systems, we extend and formalize it as hypergraphs for managing metadata. Then, we define design constraints and query transformation rules for three representative data store types. Furthermore, we propose a simple query rewriting algorithm using the catalog itself for these data store types and provide a prototype implementation. Finally, we show the feasibility of our approach on a use case of an existing polyglot system.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

DI-fusion

New Perspectives for NoSQL Database Design: A Systematic Review

Author: Bini Tarcizio Alexandre
Matos Simone Nasser
Zdepski Cristofer
Publication venue: American Academic Scientific Research Journal for Engineering, Technology, and Sciences
Publication date: 15/05/2020
Field of study

The use of NoSQL databases has increasingly become a trend in software development, mainly due to the expansion of Web 2.0 systems. However, there is not yet a standard to be used for the design of this type of database even with the growing number of studies related to this subject. This paper presents a systematic review looking for new trends regarding strategies used in this context. The result of this process demonstrates that there are still few methodologies for the NoSQL database design and there are no design methodologies capable of working with polyglot persistence

American Scientific Research Journal for Engineering, Technology, and Sciences (ASRJETS)

From NOSQL databases to decision support systems: developing a business intelligence solution

Author: Costa Marco
Pereira José Luís
Publication venue: 'Faculty of Engineering, University of Kragujevac'
Publication date: 01/01/2019
Field of study

We are living a time in which the data generated by humans and machines has reached levels never seen before the so-called era of Big Data. Everyday, vast amounts of data, coming from different sources and with different formats, are created and made available to organizations. First, with the rise of the social networks and, more recently, with the advent of the Internet of Things (IoT), data with enormous potential for organizations is being continuously generated. In order to be more competitive, organizations need to explore all the richness that is present in those data. Indeed, data is only as valuable as the insights organizations gather from it to make better decisions, which is the main goal of Business Intelligence (BI). In this paper, we describe the development of a decision support system in which data obtained from a NoSQL database is used to feed a BI solution.This work has been supported by FCT - Fundação para a Ciência e Tecnologia, within the Project Scope: UID/CEC/00319/2019

Universidade do Minho: RepositoriUM

Crossref

Are NoSQL Data Stores Useful for Bioinformatics Researchers?

Author: Borong Shao, Tim OF Conrad
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/03/2015
Field of study

The big data challenge in bioinformatics is approaching. Data storage and processing, instead of experimental technologies, are becoming the slower and more costly part of research. Biological data typically have large size and a variety of structures. The ability to efficiently store and retrieve the data is important in bioinformatics research. Traditionally, large datasets are either stored as disk-based flat-files or in relational databases. These systems become more complicated to plan, maintain and adjust to big data applications as they follow rigid table schema and often lack scalability, e.g. for data aggregation. Meanwhile, non-relational databases (NoSQL) emerge to provide alternative, flexible and more scalable data stores. In this study, we aim to quantitatively compare the latencies of different data stores on storing and querying proteomics datasets. We show benchmarks for typical relational and non-relational systems for both, in-memory and disk-based configurations and compare them to a simple flat-file based approach. We will focus on the latencies of storing and querying proteomics mass spectrometry datasets and the actual space consumption inside the data stores. Experiments are carried out on a local desktop with medium-sized data, which is the typical experimental settings of individual bioinformatics researchers. Results show that there are significant latency differences among the considered data stores (up to 30 folds). In certain use cases, flat file system can achieve comparable performance with the data stores. DOI: 10.17762/ijritcc2321-8169.150317

International Journal on Recent and Innovation Trends in Computing and Communication

Column-based databases: estudo exploratório no âmbito das bases de dados NoSQL

Author: Cunha José Pedro
Pereira José Luís Mota
Publication venue
Publication date: 02/10/2015
Field of study

O aumento da quantidade de dados gerados que se tem verificado nos últimos anos e a que se tem vindo a dar o nome de Big Data levou a que a tecnologia relacional começasse a demonstrar algumas fragilidades no seu armazenamento e manuseamento o que levou ao aparecimento das bases de dados NoSQL. Estas estão divididas por quatro tipos distintos nomeadamente chave/valor, documentos, grafos e famílias de colunas. Este artigo é focado nas bases de dados do tipo column-based e nele serão analisados os dois sistemas deste tipo considerados mais relevantes: Cassandra e HBase

Universidade do Minho: RepositoriUM

PDDM: A Database Design Method for Polyglot Persistence

Author: Bini Tarcizio Alexandre
Matos Simone Nasser
Zdepski Cristofer
Publication venue: American Academic Scientific Research Journal for Engineering, Technology, and Sciences
Publication date: 06/08/2020
Field of study

Databases by Web 2.0 has revealed the limitations of the relational model related to scalability. This led to the emergence of NoSQL databases, with data storage models other than relational ones. These databases propose solutions to such limitations through horizontal scalability and partially compromise data consistency. The combination of multiple data models, called polyglot persistence, extends these solutions by providing resources for the implementation of complex systems that have components with distinct requirements that would not be possible by the use of only one data model in a satisfactory way. However, there are no consolidated methods for the NoSQL database design and neither methods for design systems that apply the polyglot persistence. This work proposes a database design method applied to systems that use polyglot persistence, combining different data models. This method can be applied to the relational model and aggregate-oriented NoSQL data models. The method defines a set of sub-steps based on the existing concepts of database design. The goal is to define a formal process to assist in defining the data models to be used and to transform the conceptual design into a logical design. The method application is demonstrated in some test cases, in order to show its results and applicability for later execution of the physical design of these databases

American Scientific Research Journal for Engineering, Technology, and Sciences (ASRJETS)

Метод создания коллекций со вложенными документами для баз данных типа ключ-документ с учетом выполняемых запросов

Author: Ха Ван Муон
Шичкина Юлия Александровна
Publication venue: СПб ФИЦ РАН
Publication date: 01/08/2020
Field of study

In the recent decades, NoSQL databases have become more popular day by day. And increasingly, developers and database administrators, for whatever reason, have to solve the problems of database migration from a relational model in the model NoSQL databases like the document-oriented database MongoDB database. This article discusses the approach to this migration data based on set theory. A new formal method of determining the optimal runtime searches aggregate collections with the attached documents NoSQL databases such as the key document. The attributes of the database objects are included in optimizing the number of collections and their structures in search queries. The initial data are object properties (attributes, relationships between attributes) on which information is stored in the database, and query the properties that are most often performed, or the speed of which should be maximal. This article discusses the basic types of connections (1-1, 1-M, M-M), typical of the relational model. The proposed method is the following step of the method of creating a collection without embedded documents. The article also provides a method for determining what methods should be used in the reasonable cases to make work with databases more effectively. At the end, this article shows the results of testing of the proposed method on databases with different initial schemes. Experimental results show that the proposed method helps reduce the execution time of queries can also significantly as well as reduce the amount of memory required to store the data in a new database.В последние десятилетия все большую популярность набирают NoSQL базы данных, и все чаще разработчикам и администраторам таких баз по той или иной причине приходится решать задачу миграции баз данных из реляционной модели в модель NoSQL, например документно-ориентированную базу данных MongoDB. Описывается подход к такой миграции данных на основе теории множеств. Предлагаются правила для определения совокупности коллекций со вложенными документами NoSQL базы данных типа ключ-документ, оптимальной по времени выполнения поисковых запросов. Оптимизация числа коллекций и их структуры проводится с учетом атрибутов объектов базы данных, участвующих в поисковых запросах. Исходными данными являются свойства объектов (атрибуты, связи между атрибутами), информация о которых хранится в базе данных, и свойства запросов, которые наиболее часто выполняются или скорость их выполнения максимальна. В правилах учитываются основные типы связей (1-1, 1-М, М-М), свойственные реляционной модели. Рассматриваемая совокупность правил является дополнением к методу создания коллекций без вложенных документов. Также приводится методика для определения, в каких случаях какие методы надо использовать, чтобы сделать работу с базами данных более эффективной. В заключении приведены результаты тестирования предлагаемого метода на базах данных с различными начальными схемами. Результаты экспериментов показывают, что предлагаемый метод помимо сокращения времени выполнения запросов позволяет также значительно сократить объем памяти, необходимый для хранения данных в новой базе данных

Информатика и автоматизация

Метод создания коллекций со вложенными документами для баз данных типа ключ-документ с учетом выполняемых запросов

Author: Van Muon Ha
Yulia Aleksandrovna Shichkina
Publication venue: Russian Academy of Sciences, St. Petersburg Federal Research Center
Publication date: 01/08/2020
Field of study

В последние десятилетия все большую популярность набирают NoSQL базы данных, и все чаще разработчикам и администраторам таких баз по той или иной причине приходится решать задачу миграции баз данных из реляционной модели в модель NoSQL, например документно-ориентированную базу данных MongoDB. Описывается подход к такой миграции данных на основе теории множеств. Предлагаются правила для определения совокупности коллекций со вложенными документами NoSQL базы данных типа ключ-документ, оптимальной по времени выполнения поисковых запросов. Оптимизация числа коллекций и их структуры проводится с учетом атрибутов объектов базы данных, участвующих в поисковых запросах. Исходными данными являются свойства объектов (атрибуты, связи между атрибутами), информация о которых хранится в базе данных, и свойства запросов, которые наиболее часто выполняются или скорость их выполнения максимальна. В правилах учитываются основные типы связей (1-1, 1-М, М-М), свойственные реляционной модели. Рассматриваемая совокупность правил является дополнением к методу создания коллекций без вложенных документов. Также приводится методика для определения, в каких случаях какие методы надо использовать, чтобы сделать работу с базами данных более эффективной. В заключении приведены результаты тестирования предлагаемого метода на базах данных с различными начальными схемами. Результаты экспериментов показывают, что предлагаемый метод помимо сокращения времени выполнения запросов позволяет также значительно сократить объем памяти, необходимый для хранения данных в новой базе данных

Directory of Open Access Journals

Geospatial Anarchy: Managing datasets the Open Source way

Author: Sveen Atle Frenvik
Publication venue: Geoforum Danmark
Publication date: 01/01/2018
Field of study

OpenStreetMap (OSM) is the largest and best-known example of geospatial data creation using Volunteered Geographic Information (VGI). A large group of non-specialists joins their efforts online to create an open, worldwide map of the world. The project differs from traditional management of geospatial data on several accounts: both the underlying technology (Open Source components) and the mindset (schema-less structures using tags and changesets). We review how traditional organizations are currently using the OSM technology to meet their needs and how the mindset of OSM could be employed to traditional management of spatial datasets as well

Directory of Open Access Journals

Open Access Journals at Aalborg University

Toward Scalable Hybrid Stores

Author: Bugiotti Francesca
Bursztyn Damian
Deutsch Alin
Ileana Ioana
Manolescu Ioana
Publication venue: HAL CCSD
Publication date: 14/06/2015
Field of study

National audienceData centric applications often use heterogeneous datasets: some very large while others of moderate size, some highly structured (e.g., relations) while others complex structured (e.g., graphs) or little structured (e.g., log data). Facing them is a variety of storage systems but none of which is the best for all, at all times.We present Estocada, an architecture we are currently developing to efficiently handle highly heterogeneous datasets based on a dynamic set of potentially very different data stores. Estocada provides to the ap- plication layer access to each dataset in its native format, while hosting them internally in a set of potentially overlapping fragments, possibly distributed across heterogeneous stores. At the core of Estocada lie powerful view-based rewriting and view selection algorithms to marry correctness with high performance

INRIA a CCSD electronic archive server