2 research outputs found
A Review of Infrastructures to Process Big Multimedia Data
In the last years, the volume of information is growing faster than ever before, moving from small to huge, structured to unstructured datasets like text, image, audio and video. The purpose of processing the data is aimed to extract relevant information on trends, challenges and opportunities; all these studies with large volumes of data. The increase in the power of parallel computing enabled the use of Machine Learning (ML) techniques to take advantage of the processing capabilities offered by new architectures on large volumes of data. For this reason, it is necessary to find mechanisms that allow classify and organize them to facilitate to the users the extraction of the required information. The processing of these data requires the use of classification techniques that will be reviewed. This work analyzes different studies carried out on the use of ML for processing large volumes of data (Big Multimedia Data) and proposes a classification, using as criteria, the hardware infrastructures used in works of machine learning parallel approaches applied to large volumes of data
Static and dynamic big data partitioning on Apache Spark
Many of today’s large datasets are organized as a graph. Due to their
size it is often infeasible to process these graphs using a single machine. Therefore,
many software frameworks and tools have been proposed to process graph on top of
distributed infrastructures. This software is often bundled with generic data decomposition
strategies that are not optimised for specific algorithms. In this paper we
study how a specific data partitioning strategy affects the performances of graph algorithms
executing on Apache Spark. To this end, we implemented different graph
algorithms and we compared their performances using a naive partitioning solution
against more elaborate strategies, both static and dynamic