637 research outputs found

    Optimization of Columnar NoSQL Data Warehouse Model with Clarans Clustering Algorithm

    Get PDF
    In order to perfectly meet the needs of business leaders, decision-makers have resorted to the integration of external sources (such as Linked Open Data) in the decision-making system in order to enrich their existing data warehouses with new concepts contributing to bring added value to their organizations, enhance its productivity and retain its customers. However, the traditional data warehouse environment is not suitable to support external Big Data. To deal with this new challenge, several researches are oriented towards the direct conversion of classical relational data warehouse to a columnar NoSQL data warehouse, whereas the existing advanced works based on clustering algorithms are very limited and have several shortcomings. In this context, our paper proposes a new solution that conceives an optimized columnar data warehouse based on CLARANS clustering algorithm that has proven its effectiveness in generating optimal column families. Experimental results improve the validity of our system by performing a detailed comparative study between the existing advanced approaches and our proposed optimized method

    NoSQL Schema Design for Time-Dependent Workloads

    Full text link
    In this paper, we propose a schema optimization method for time-dependent workloads for NoSQL databases. In our proposed method, we migrate schema according to changing workloads, and the estimated cost of execution and migration are formulated and minimized as a single integer linear programming problem. Furthermore, we propose a method to reduce the number of optimization candidates by iterating over the time dimension abstraction and optimizing the workload while updating constraints

    Graphulo Implementation of Server-Side Sparse Matrix Multiply in the Accumulo Database

    Full text link
    The Apache Accumulo database excels at distributed storage and indexing and is ideally suited for storing graph data. Many big data analytics compute on graph data and persist their results back to the database. These graph calculations are often best performed inside the database server. The GraphBLAS standard provides a compact and efficient basis for a wide range of graph applications through a small number of sparse matrix operations. In this article, we implement GraphBLAS sparse matrix multiplication server-side by leveraging Accumulo's native, high-performance iterators. We compare the mathematics and performance of inner and outer product implementations, and show how an outer product implementation achieves optimal performance near Accumulo's peak write rate. We offer our work as a core component to the Graphulo library that will deliver matrix math primitives for graph analytics within Accumulo.Comment: To be presented at IEEE HPEC 2015: http://www.ieee-hpec.org

    Performance Optimizations of NoSQL Databases in Distributed Systems

    Get PDF
    Databases store information about a system and provide a mechanism for data to be accessed and manipulated. While advancements in the 1970s provided a relational database model that has persisted to this day, web-scale era mass data needs surfacing in the 1990s and the early 2000s revealed limitations in the scalability of the relational model. As systems grew and transitioned into distributed architectures to support mass data storage and parallel processing, a complete overhaul of distributed computing technologies evolved that fundamentally departed from the relational data model in favor of the NoSQL data model. The course of this research details the scaling problems encountered by relational databases and the NoSQL solutions that made web-scale systems possible

    Cassandra Data Modeling

    Get PDF
    To work with large amount of data consisting of 4 v�s velocity, variety, volume and veracity that is nothing but big data, so the need arises to find out the solution to work on such large scale data with high performance. NoSQL databases helps in this scenario. With Cassandra we can efficiently manage the large amount of structured data. It supports dynamic control over the data. In this paper, we present relational and Cassandra data modeling

    Benchmarking of RDBMS and NoSQL performance on unstructured data

    Get PDF
    New requirements are arising in the database field. Big data has been soaring. The amount of data is ever increasing and becoming more and more varied. Traditional relational database management systems have been a dominant force in the database field but due to the massive growth of unstructured and multiform data, firms are now turning to architectures that have scaleout capabilities using open source software, commodity servers, cloud computing and services like Database as a Service. Due to this, relational databases ought to adopt and meet these new data requirements with easier and faster data processing capabilities and also provide multiple analytical tools that have the possibility of displaying analytics instantly. This study aims to benchmark the performance of relational systems and NoSQL systems on unstructured data.Novos requisitos estão surgindo na área das bases de dados. “Big data” permitiu avanços consideráveis em vários setores. O volume de dados tem aumentado e tornase cada vez mais variado. Os sistemas tradicionais de gestão de base de dados relacionais têm sido uma força dominante na área, mas devido ao crescimento massivo de dados não estruturados e multiformes, as empresas agora recorrem a arquiteturas que possuem recursos escaláveis usando software livre, servidores, computação em nuvem e serviços, tais como “base de dados como um serviço”. Nesse sentido, as bases de dados relacionais devem considerar e adotar novos requisitos de dados com maior agilidade no seu processamento e também fornecer múltiplas ferramentas analíticas com a possibilidade de mostrar análises em tempo real. Este estudo tem como objetivo avaliar o desempenho de sistemas relacionais e sistemas NoSQL em dados não estruturados

    Which NoSQL Database? A Performance Overview

    Get PDF
    NoSQL data stores are widely used to store and retrieve possibly large amounts of data, typically in a key-value format. There are many NoSQL types with different performances, and thus it is important to compare them in terms of performance and verify how the performance is related to the database type. In this paper, we evaluate five most popular NoSQL databases: Cassandra, HBase, MongoDB, OrientDB and Redis. We compare those databases in terms of query performance, based on reads and updates, taking into consideration the typical workloads, as represented by the Yahoo! Cloud Serving Benchmark. This comparison allows users to choose the most appropriate database according to the specific mechanisms and application needs
    corecore