Search CORE

637 research outputs found

Optimization of Columnar NoSQL Data Warehouse Model with Clarans Clustering Algorithm

Author: Soussi Nassima
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 31/08/2023
Field of study

In order to perfectly meet the needs of business leaders, decision-makers have resorted to the integration of external sources (such as Linked Open Data) in the decision-making system in order to enrich their existing data warehouses with new concepts contributing to bring added value to their organizations, enhance its productivity and retain its customers. However, the traditional data warehouse environment is not suitable to support external Big Data. To deal with this new challenge, several researches are oriented towards the direct conversion of classical relational data warehouse to a columnar NoSQL data warehouse, whereas the existing advanced works based on clustering algorithms are very limited and have several shortcomings. In this context, our paper proposes a new solution that conceives an optimized columnar data warehouse based on CLARANS clustering algorithm that has proven its effectiveness in generating optimal column families. Experimental results improve the validity of our system by performing a detailed comparative study between the existing advanced approaches and our proposed optimized method

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

NoSQL Schema Design for Time-Dependent Workloads

Author: Mior Michael
Onizuka Makoto
Sasaki Yuya
Wakuta Yusuke
Zenmyo Teruyoshi
Publication venue
Publication date: 29/03/2023
Field of study

In this paper, we propose a schema optimization method for time-dependent workloads for NoSQL databases. In our proposed method, we migrate schema according to changing workloads, and the estimated cost of execution and migration are formulated and minimized as a single integer linear programming problem. Furthermore, we propose a method to reduce the number of optimization candidates by iterating over the time dimension abstraction and optimizing the workload while updating constraints

arXiv.org e-Print Archive

Graphulo Implementation of Server-Side Sparse Matrix Multiply in the Accumulo Database

Author: Fuchs Adam
Gadepally Vijay
Hutchison Dylan
Kepner Jeremy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/08/2015
Field of study

The Apache Accumulo database excels at distributed storage and indexing and is ideally suited for storing graph data. Many big data analytics compute on graph data and persist their results back to the database. These graph calculations are often best performed inside the database server. The GraphBLAS standard provides a compact and efficient basis for a wide range of graph applications through a small number of sparse matrix operations. In this article, we implement GraphBLAS sparse matrix multiplication server-side by leveraging Accumulo's native, high-performance iterators. We compare the mathematics and performance of inner and outer product implementations, and show how an outer product implementation achieves optimal performance near Accumulo's peak write rate. We offer our work as a core component to the Graphulo library that will deliver matrix math primitives for graph analytics within Accumulo.Comment: To be presented at IEEE HPEC 2015: http://www.ieee-hpec.org

arXiv.org e-Print Archive

Crossref

Performance Optimizations of NoSQL Databases in Distributed Systems

Author: Maalouf Tristyn
Publication venue: Scholars Crossing
Publication date: 20/04/2020
Field of study

Databases store information about a system and provide a mechanism for data to be accessed and manipulated. While advancements in the 1970s provided a relational database model that has persisted to this day, web-scale era mass data needs surfacing in the 1990s and the early 2000s revealed limitations in the scalability of the relational model. As systems grew and transitioned into distributed architectures to support mass data storage and parallel processing, a complete overhaul of distributed computing technologies evolved that fundamentally departed from the relational data model in favor of the NoSQL data model. The course of this research details the scaling problems encountered by relational databases and the NoSQL solutions that made web-scale systems possible

Liberty University Digital Commons

Cassandra Data Modeling

Author: Prof. Minal P. Patil
Publication venue: Auricle Global Society of Education and Research
Publication date: 26/02/2018
Field of study

To work with large amount of data consisting of 4 v�s velocity, variety, volume and veracity that is nothing but big data, so the need arises to find out the solution to work on such large scale data with high performance. NoSQL databases helps in this scenario. With Cassandra we can efficiently manage the large amount of structured data. It supports dynamic control over the data. In this paper, we present relational and Cassandra data modeling

International Journal on Future Revolution in Computer Science & Communication Engineering

Recommended from our members

NoSQL Database Technologies

Author: Barnhill Mark
Godin Joy
Madison Michael
Napier Cassie
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2015
Field of study

As cloud computing continues to evolve, organizations are finding new ways to store the massive amounts of big data that are collected. Big data storage often require greater flexibility and scalability which can be provided by incorporating NoSQL technologies. NoSQL (Not Only SQL) is quickly becoming a popular approach to store large and unstructured data. This paper looks at the various classifications of NoSQL technologies as well as many of the notable characteristics of the technologies. The authors also describe some deficiencies of using NoSQL and give some explanation to why companies are adopting the technology. The paper concludes with suggestions for future research of NoSQL technologies and a content analysis of current articles in database management is provided in the appendix

CSUSB ScholarWorks

Benchmarking of RDBMS and NoSQL performance on unstructured data

Author: Karimi Mwai
Publication venue
Publication date: 31/01/2019
Field of study

New requirements are arising in the database field. Big data has been soaring. The amount of data is ever increasing and becoming more and more varied. Traditional relational database management systems have been a dominant force in the database field but due to the massive growth of unstructured and multiform data, firms are now turning to architectures that have scaleout capabilities using open source software, commodity servers, cloud computing and services like Database as a Service. Due to this, relational databases ought to adopt and meet these new data requirements with easier and faster data processing capabilities and also provide multiple analytical tools that have the possibility of displaying analytics instantly. This study aims to benchmark the performance of relational systems and NoSQL systems on unstructured data.Novos requisitos estão surgindo na área das bases de dados. “Big data” permitiu avanços consideráveis em vários setores. O volume de dados tem aumentado e tornase cada vez mais variado. Os sistemas tradicionais de gestão de base de dados relacionais têm sido uma força dominante na área, mas devido ao crescimento massivo de dados não estruturados e multiformes, as empresas agora recorrem a arquiteturas que possuem recursos escaláveis usando software livre, servidores, computação em nuvem e serviços, tais como “base de dados como um serviço”. Nesse sentido, as bases de dados relacionais devem considerar e adotar novos requisitos de dados com maior agilidade no seu processamento e também fornecer múltiplas ferramentas analíticas com a possibilidade de mostrar análises em tempo real. Este estudo tem como objetivo avaliar o desempenho de sistemas relacionais e sistemas NoSQL em dados não estruturados

Sapientia

Which NoSQL Database? A Performance Overview

Author: Abramova Veronika
Bernardino Jorge
Furtado Pedro
Publication venue: RonPub
Publication date: 01/01/2014
Field of study

NoSQL data stores are widely used to store and retrieve possibly large amounts of data, typically in a key-value format. There are many NoSQL types with different performances, and thus it is important to compare them in terms of performance and verify how the performance is related to the database type. In this paper, we evaluate five most popular NoSQL databases: Cassandra, HBase, MongoDB, OrientDB and Redis. We compare those databases in terms of query performance, based on reads and updates, taking into consideration the typical workloads, as represented by the Yahoo! Cloud Serving Benchmark. This comparison allows users to choose the most appropriate database according to the specific mechanisms and application needs

RonPub -- Research Online Publishing

Estudo Geral

Transformation of Matlab analysis programs into database-supported evaluations of SQL and NoSQL database systems

Author: Voigt Jakob
Publication venue
Publication date: 11/02/2020
Field of study

Universität Rostock, Lehrstuhl Datenbank- und Informationssysteme: Dbis Repository