7,134 research outputs found

    Big Data Management Challenges, Approaches, Tools and their limitations

    No full text
    International audienceBig Data is the buzzword everyone talks about. Independently of the application domain, today there is a consensus about the V's characterizing Big Data: Volume, Variety, and Velocity. By focusing on Data Management issues and past experiences in the area of databases systems, this chapter examines the main challenges involved in the three V's of Big Data. Then it reviews the main characteristics of existing solutions for addressing each of the V's (e.g., NoSQL, parallel RDBMS, stream data management systems and complex event processing systems). Finally, it provides a classification of different functions offered by NewSQL systems and discusses their benefits and limitations for processing Big Data

    Security Implications of Adopting a New Data Storage and Access Model in Big Data and Cloud Computing

    Get PDF
    This article examines the security implications of using cloud computing and Big Data. It employs a mixed methodology of qualitative and quantitative research and takes a critical realist epistemological approach. The objective is to identify the components of a theory for predicting and explaining [1, 4] the security implications associated with adopting the services provided by cloud computing and Big Data. The integration of various information sources and the widespread use of computing across diverse fields have resulted in a significant increase in data volume, scale, quantity, and diversity. Consequently, data management, storage, retrieval, and access have undergone significant changes. The latest developments in IT have brought forth novel technologies such as Cloud Computing and Big Data. Big Data comprises of technologies that rely on NoSQL (Not only SQL) databases, which enable the growth of data volumes, numbers, and types on a large scale. The new NoSQL systems are seen as solutions for meeting scalability requirements of large IT firms. Multiple open-source and pay-as-you-go NoSQL models are available for purchase

    Application of HADOOP to Store and Process Big Data Gathered from an Urban Water Distribution System

    Get PDF
    Information technology has become an integral part of municipal water distribution systems (WDS). Various types of sensors, e.g., smart water meters, usually work in real-time mode delivering a huge amount of data. Big data must be stored in appropriate databases. Along with the development of data mining tools, the analysis of big data is very important for the management of WDS. Valuation of NoSQL databases for water data is currently in its very early stages. In this paper, the Apache Hadoop platform is investigated with respect to a possible database solution based on NoSQL. We present comparative experiments evaluating the performance of the Hadoop and MySQL databases

    Evaluation Criteria for Selecting NoSQL Databases in a Single Box Environment

    Get PDF
    In recent years, NoSQL database systems have become increasingly popular, especially for big data, commercial applications. These systems were designed to overcome the scaling and flexibility limitations plaguing traditional relational database management systems (RDBMSs). Given NoSQL database systems have been typically implemented in large-scale distributed environments serving large numbers of simultaneous users across potentially thousands of geographically separated devices, little consideration has been given to evaluating their value within single-box environments. It is postulated some of the inherent traits of each NoSQL database type may be useful, perhaps even preferable, regardless of scale. Thus, this paper proposes criteria conceived to evaluate the usefulness of NoSQL systems in small-scale single-box environments. Specifically, key value, document, column family, and graph database are discussed with respect to the ability of each to provide CRUD transactions in a single-box environment

    Evaluating Riak Key Value Cluster for Big Data

    Get PDF
    NoSQL database has become an important alternative to traditional relational databases. Those databases are prepared by the management of large, continuously and variably changing data sets. They are widely used in cloud databases and distributed systems. With NoSQL databases, static schemes and many other restrictions are avoided. In the era of big data, such databases provide scalable high availability solutions. Their key-value feature allows fast retrieval of data and the ability to store a lot of it. There are many kinds of NoSQL databases with various performances. Therefore, comparing those different types of databases in terms of performance and verifying the relationship between performance and database type has become very important. In this paper, we test and evaluate the Riak key-value database for big data clusters using benchmark tools, where huge amounts of data are stored and retrieved in different sizes in a distributed database environment. Execution times of the NoSQL database over different types of workloads and different sizes of data are compared. The results show that the Riak key-value is stable in execution time for both small and large amounts of data, and the throughput performance increases as the number of threads increases

    OASIS - Identifying the Core Attributes for RDBMS Alternatives

    Get PDF
    Since their introduction in the 1970s, relational database management systems have served as the dominate data storage technology. However, the demands of big data and Web 2.0 necessitated a change in the market, sparking the beginning of the NoSQL movement in the late 2000s. NoSQL databases exchanged the relational model and the guaranteed consistency of ACID transactions for improved performance and massive scalability [1]. While the benefits NoSQL provided proved useful, the lack of sufficient SQL functionality presented a major hurdle for organizations which require it to properly operate. It was clear that new RDBMS solutions which did not compromise functionality or scalability were necessary, which has led to the rise of a new class of modern relational database management systems, NewSQL [2]. This paper seeks to identify a consistent set of requirements necessary for an ideal RDBMS substitute. Among these requirements include possessing the features of a modern RDBMS, which includes support of the relational data model and standard ANSI SQL, ACID transactions, and ODBC/JDBC drivers. Additionally, the substitute must address typical RDBMS’ shortcomings in scalability by providing cost-effective scale-out capabilities. These requirements will then be used to filter out existing NoSQL and NewSQL database systems which could serve as viable substitutes to a typical RDBMS

    PERFORMANCE ANALYSIS OF TWO BIG DATA TECHNOLOGIES ON A CLOUD DISTRIBUTED ARCHITECTURE. RESULTS FOR NON-AGGREGATE QUERIES ON MEDIUM-SIZED DATA

    Get PDF
    Big Data systems manage and process huge volumes of data constantly generated by various technologies in a myriad of formats. Big Data advocates (and preachers) have claimed that, relative to classical, relational/SQL Data Base Management Systems, Big Data technologies such as NoSQL, Hadoop and in-memory data stores perform better. This paper compares data processing performance of two systems belonging to SQL (PostgreSQL/Postgres XL) and Big Data (Hadoop/Hive) camps on a distributed five-node cluster deployed in cloud. Unlike benchmarks in use (YCSB, TPC), a series of R modules were devised for generating random non-aggregate queries on different subschema (with increasing data size) of TPC-H database. Overall performance of the two systems was compared. Subsequently a number of models were developed for relating performance on the system and also on various query parameters such as the number of attributes in SELECT and WHERE clause, number of joins, number of processing rows etc.JEL Codes - M1

    Performance Evaluation Between HarperDB, Mongo DB and PostgreSQL

    Get PDF
    Several modern-day problems, like information overload and big data, need to deal with large amounts of data. As such, to meet the application requirements, for instance, performance and consistency, more and more systems are adapting to the specificities. The existing Relational Database Management System (RDBMS)’s the processing of massive data has become an issue because these databases do not deal with a massive amount of data. NoSQL is a database management system that makes processing massive and/or unstructured data easier because it uses key-value to store the data, collections or document stores instead of tables. Many companies today tend to start a project using NoSQL. However, HarperDB aims to produce a relational and nonrelational DBMS, allowing developers to choose between different solutions. This paper aims to show the most relevant differences between HarperDB, MongoDB and PostgreSQL and compare their performances. Preliminary results show that PostgreSQL performs better with structured data, but HarperDB can integrate NoSQL and SQL, which can be a significant advantage to HarperDB compared to the other solutions.info:eu-repo/semantics/publishedVersio
    • …
    corecore