4 research outputs found

    An efficient strategy for the collection and storage of large volumes of data for computation

    Get PDF
    In recent years, there has been an increasing amount of data being produced and stored, which is known as Big Data. The social networks, internet of things, scientific experiments and commercial services play a significant role in generating a vast amount of data. Three main factors are important in Big Data; Volume, Velocity and Variety. One needs to consider all three factors when designing a platform to support Big Data. The Large Hadron Collider (LHC) particle accelerator at CERN consists of a number of data-intensive experiments, which are estimated to produce a volume of about 30 PB of data, annually. The velocity of these data that are propagated will be extremely fast. Traditional methods of collecting, storing and analysing data have become insufficient in managing the rapidly growing volume of data. Therefore, it is essential to have an efficient strategy to capture these data as they are produced. In this paper, a number of models are explored to understand what should be the best approach for collecting and storing Big Data for analytics. An evaluation of the performance of full execution cycles of these approaches on the monitoring of the Worldwide LHC Computing Grid (WLCG) infrastructure for collecting, storing and analysing data is presented. Moreover, the models discussed are applied to a community driven software solution, Apache Flume, to show how they can be integrated, seamlessly

    Ecosistema Big Data en un clúster de Raspberry Pi

    Get PDF
    Esta investigación mostrara un paso a paso de como instalar y configurar Hadoop en un clúster de raspberrys pi, describiendo y explicando desde los fundamentos de Big Data hasta todo el ecosistema de Apache y para que funciona cada tecnología. Además de recopilar información de algunas de las publicaciones mas relevantes relacionadas con Big Data

    Sijaintipohjaisen datan visualisointiputki

    Get PDF
    The master’s thesis focused on answering how to turn a plethora of location-based data into a visualization pipeline. The goal was to find a way to bring the most value to the users of the visualization pipeline, in this case a taxi company, by defining a set of audience-oriented approaches. The audience-oriented approaches were tested using a case study, using the taxi company as the test subject. The case study was used to test the functionality and design of the visualization pipeline, as well as to test the audience-oriented approaches in practice. The success of the visualization pipeline was assessed using a Business Intelligence assessment model. The Business Intelligence assessment was usedas a benchmark for how much value the implemented visualization pipeline provided. We were able to define three types of audiences, and correspondingly three different approaches for these said audience types. The three audience types were activists, analysts and organizational decision-makers, and their approaches were defined correspondingly as”lightweight”, ”technical”, and ”tailored” approaches.The case study was carried out by defining an audience group for the customer. The case study defined the case customer as an organizational decision-maker, thus resulting the ”tailored” approach as the best fit. The approach provided the most value to the case customer both in the terms of technical requirements as well as in the terms of data analytical needs. This case study showed promise for the utility of an audience-oriented approach in visualization pipeline design
    corecore