2 research outputs found

    A Review of Infrastructures to Process Big Multimedia Data

    Get PDF
    In the last years, the volume of information is growing faster than ever before, moving from small to huge, structured to unstructured datasets like text, image, audio and video. The purpose of processing the data is aimed to extract relevant information on trends, challenges and opportunities; all these studies with large volumes of data. The increase in the power of parallel computing enabled the use of Machine Learning (ML) techniques to take advantage of the processing capabilities offered by new architectures on large volumes of data. For this reason, it is necessary to find mechanisms that allow classify and organize them to facilitate to the users the extraction of the required information. The processing of these data requires the use of classification techniques that will be reviewed. This work analyzes different studies carried out on the use of ML for processing large volumes of data (Big Multimedia Data) and proposes a classification, using as criteria, the hardware infrastructures used in works of machine learning parallel approaches applied to large volumes of data

    Static and dynamic big data partitioning on Apache Spark

    No full text
    Many of today’s large datasets are organized as a graph. Due to their size it is often infeasible to process these graphs using a single machine. Therefore, many software frameworks and tools have been proposed to process graph on top of distributed infrastructures. This software is often bundled with generic data decomposition strategies that are not optimised for specific algorithms. In this paper we study how a specific data partitioning strategy affects the performances of graph algorithms executing on Apache Spark. To this end, we implemented different graph algorithms and we compared their performances using a naive partitioning solution against more elaborate strategies, both static and dynamic
    corecore