11 research outputs found

    Access control technologies for Big Data management systems: literature review and future trends

    Get PDF
    Abstract Data security and privacy issues are magnified by the volume, the variety, and the velocity of Big Data and by the lack, up to now, of a reference data model and related data manipulation languages. In this paper, we focus on one of the key data security services, that is, access control, by highlighting the differences with traditional data management systems and describing a set of requirements that any access control solution for Big Data platforms may fulfill. We then describe the state of the art and discuss open research issues

    A Review of Infrastructures to Process Big Multimedia Data

    Get PDF
    In the last years, the volume of information is growing faster than ever before, moving from small to huge, structured to unstructured datasets like text, image, audio and video. The purpose of processing the data is aimed to extract relevant information on trends, challenges and opportunities; all these studies with large volumes of data. The increase in the power of parallel computing enabled the use of Machine Learning (ML) techniques to take advantage of the processing capabilities offered by new architectures on large volumes of data. For this reason, it is necessary to find mechanisms that allow classify and organize them to facilitate to the users the extraction of the required information. The processing of these data requires the use of classification techniques that will be reviewed. This work analyzes different studies carried out on the use of ML for processing large volumes of data (Big Multimedia Data) and proposes a classification, using as criteria, the hardware infrastructures used in works of machine learning parallel approaches applied to large volumes of data

    A Comparison of Real Time Stream Processing Frameworks

    Get PDF
    The need to process the ever-expanding volumes of information being generated daily in the modern world is driving radical changes in traditional data analysis techniques. As a result of this, a number of open source tools for handling real-time data streams has become available in recent years. Four, in particular, have gained significant traction: Apache Flink, Apache Samza, Apache Spark and Apache Storm. Despite the rising popularity of these frameworks, however, there are few studies that analyse their performance in terms of important metrics, such as throughput and latency. This study aims to correct this, by running several benchmarks against these frameworks

    Feasibility analysis of AsterixDB and Spark streaming with Cassandra for stream-based processing

    No full text
    Abstract For getting up-to-date insight into online services, extracted data has to be processed in near real time. For example, major big data companies (Facebook, LinkedIn, Twitter) analyse streaming data for development of new services. Several technologies have been developed, which could be selected for implementation of stream processing functionalities. The contribution of this paper is feasibility analysis of technologies for stream-based processing of semi-structured data. Particularly, feasibility of a Big Data management system for semi-structured data (AsterixDB) will be compared to Spark streaming, which has been integrated with Cassandra NoSQL database for persistence. The study focuses on stream processing in a simulated social media use case (tweet analysis), which has been implemented to Eucalyptus cloud computing environment on a distributed shared memory multiprocessor platform. The results indicate that AsterixDB is able to provide significantly better performance both in terms of throughput and latency, when data feed functionality of AsterixDB is used, and stream processing has been implemented with Java. AsterixDB also scaled on the same level or better, when the amount of nodes on the cloud platform was increased. However, stream processing in AsterixDB was delayed by batching of data, when tweets were streamed into the database with data feeds

    Feasibility analysis of AsterixDB and Spark streaming with Cassandra for stream-based processing

    No full text
    Abstract For getting up-to-date insight into online services, extracted data has to be processed in near real time. For example, major big data companies (Facebook, LinkedIn, Twitter) analyse streaming data for development of new services. Several technologies have been developed, which could be selected for implementation of stream processing functionalities. The contribution of this paper is feasibility analysis of technologies for stream-based processing of semi-structured data. Particularly, feasibility of a Big Data management system for semi-structured data (AsterixDB) will be compared to Spark streaming, which has been integrated with Cassandra NoSQL database for persistence. The study focuses on stream processing in a simulated social media use case (tweet analysis), which has been implemented to Eucalyptus cloud computing environment on a distributed shared memory multiprocessor platform. The results indicate that AsterixDB is able to provide significantly better performance both in terms of throughput and latency, when data feed functionality of AsterixDB is used, and stream processing has been implemented with Java. AsterixDB also scaled on the same level or better, when the amount of nodes on the cloud platform was increased. However, stream processing in AsterixDB was delayed by batching of data, when tweets were streamed into the database with data feeds
    corecore