1,119 research outputs found

    Scalable Persistent Storage for Erlang

    Get PDF
    The many core revolution makes scalability a key property. The RELEASE project aims to improve the scalability of Erlang on emergent commodity architectures with 100,000 cores. Such architectures require scalable and available persistent storage on up to 100 hosts. We enumerate the requirements for scalable and available persistent storage, and evaluate four popular Erlang DBMSs against these requirements. This analysis shows that Mnesia and CouchDB are not suitable persistent storage at our target scale, but Dynamo-like NoSQL DataBase Management Systems (DBMSs) such as Cassandra and Riak potentially are. We investigate the current scalability limits of the Riak 1.1.1 NoSQL DBMS in practice on a 100-node cluster. We establish for the first time scientifically the scalability limit of Riak as 60 nodes on the Kalkyl cluster, thereby confirming developer folklore. We show that resources like memory, disk, and network do not limit the scalability of Riak. By instrumenting Erlang/OTP and Riak libraries we identify a specific Riak functionality that limits scalability. We outline how later releases of Riak are refactored to eliminate the scalability bottlenecks. We conclude that Dynamo-style NoSQL DBMSs provide scalable and available persistent storage for Erlang in general, and for our RELEASE target architecture in particular

    NOSQL design for analytical workloads: Variability matters

    Get PDF
    Big Data has recently gained popularity and has strongly questioned relational databases as universal storage systems, especially in the presence of analytical workloads. As result, co-relational alternatives, commonly known as NOSQL (Not Only SQL) databases, are extensively used for Big Data. As the primary focus of NOSQL is on performance, NOSQL databases are directly designed at the physical level, and consequently the resulting schema is tailored to the dataset and access patterns of the problem in hand. However, we believe that NOSQL design can also benefit from traditional design approaches. In this paper we present a method to design databases for analytical workloads. Starting from the conceptual model and adopting the classical 3-phase design used for relational databases, we propose a novel design method considering the new features brought by NOSQL and encompassing relational and co-relational design altogether.Peer ReviewedPostprint (author's final draft

    Data management in cloud environments: NoSQL and NewSQL data stores

    Get PDF
    : Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the large number and diversity of existing NoSQL and NewSQL solutions, it is difficult to comprehend the domain and even more challenging to choose an appropriate solution for a specific task. Therefore, this paper reviews NoSQL and NewSQL solutions with the objective of: (1) providing a perspective in the field, (2) providing guidance to practitioners and researchers to choose the appropriate data store, and (3) identifying challenges and opportunities in the field. Specifically, the most prominent solutions are compared focusing on data models, querying, scaling, and security related capabilities. Features driving the ability to scale read requests and write requests, or scaling data storage are investigated, in particular partitioning, replication, consistency, and concurrency control. Furthermore, use cases and scenarios in which NoSQL and NewSQL data stores have been used are discussed and the suitability of various solutions for different sets of applications is examined. Consequently, this study has identified challenges in the field, including the immense diversity and inconsistency of terminologies, limited documentation, sparse comparison and benchmarking criteria, and nonexistence of standardized query languages

    NewSQL Monitoring System

    Get PDF
    NewSQL is the new breed of databases that combines the best of RDBMS and NoSQL databases. They provide full ACID compliance like RDBMS and are highly scalable and fault-tolerant similar to NoSQL databases. Thus, NewSQL databases are ideal candidates for supporting big data and applications, particularly financial transaction and fraud detection systems, requiring ACID guarantees. Since NewSQL databases can scale to thousands of nodes, it becomes tedious to monitor the entire cluster and each node. Hence, we are building a NewSQL monitoring system using open-source tools. We will consider VoltDB, a popular open-source NewSQL database, as the database to be monitored. Although a monitoring dashboard exists for VoltDB, it only provides the bird’s eye view of the cluster and the nodes and focuses on CPU usages and security aspects. Therefore, several components of a monitoring system have to be considered and have to be open source to be readily available and congruent with the scalability and fault tolerance of VoltDB. Databases like Cassandra (NoSQL), YugabyteDB (NewSQL), and InfluxDB (Time Series) will be used based on their read/write performances and scalability, fault tolerance to store the monitoring data. We will also consider the role of Amazon Kinesis, a popular queueing, messaging, and streaming engine, since it provides fault-tolerant streaming and batching data pipelines between application and system. This project is implemented using Python and Java

    Cassandra Data Modeling

    Get PDF
    To work with large amount of data consisting of 4 v�s velocity, variety, volume and veracity that is nothing but big data, so the need arises to find out the solution to work on such large scale data with high performance. NoSQL databases helps in this scenario. With Cassandra we can efficiently manage the large amount of structured data. It supports dynamic control over the data. In this paper, we present relational and Cassandra data modeling

    Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

    Get PDF
    Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB
    • …
    corecore