1,093,078 research outputs found

    Big data. Big challenges

    Get PDF
    Una quindicina di anni fa Doug Laney (2001) si trovò a definire uno scenario emergente in cui – grazie alla crescente potenza di calcolo delle macchine – grandi mole di dati potevano essere messe insieme ed analizzate per rispondere più efficacemente alle nostre domande. Laney, senza nemmeno utilizzare il concetto di big data, evidenziava nel Volume (la massa dei dati), nella Velocity (di creazione e trasmissione) e nella Variety (delle fonti di informazioni) le caratteristiche costitutive di queste nuove grandi basi di dati. Solo di recente, alle ormai famose 3V se ne è aggiunta un’altra, la Veracity, vale a dire la qualità dei dati. Questo significa che l’inclusione nelle analisi empiriche di base di dati eterogenee, anche se grandi, comunque solleva domande sulla completezza e l’accuratezza dei dati raccolti. Tanto più se questi vengono restituiti al pubblico in forma di visualizzazioni ed infografiche più o meno spettacolari. L’uso di questi dispositivi per finalità di comunicazione è ormai enorme. In politica possono servire come strumento di fact-checking ad uso e consumo dell’opinione pubblica o come forme di persuasione più o meno occulta, per dare legittimità a politiche cosiddette evidence-based o per strategie di profilazione utenti. Le risorse che presentiamo in questa rubrica vanno interpretate come esempi d’uso, manifestazioni della potenza di calcolo da un lato e dell’idea – probabilmente sbagliata – che avere a che fare con i dati incrementi in qualche modo la nostra capacità di scelta razionale

    Challenges of Big Data Analysis

    Full text link
    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

    Big Data Management in Education Sector: an Overview

    Get PDF
    The advancement in technological innovation has given rise to a new trend known as Big Data today. Given the soaring popularity of big data technology, organisations are profoundly attracted to and interested in it to transform their organisation by improving their businesses. Big data is enabling organisations to outpace their competitors and save cost. Similarly, the application of Big Data management in Universities is an essential aspect to institutions that have Big Data to manage; as the use of Big Data in the higher education sector is increasing day by day. Many studies have been carried out on big data and analytics with little interest in its management. Big Data management is a reality that represents a set of challenges involving Big Data modeling, storage, and retrieval, analysis, and visualization for several areas in organizations. This paper introduces and contributes to the conceptual and theoretical understanding of Big Data management within higher education as it outlines its relevance to higher education institutions. It describes the opportunities this growing research area brings to higher education as well as major challenges associated with it

    Big Data Management in Education Sector: an Overview

    Get PDF
    The advancement in technological innovation has given rise to a new trend known as Big Data today. Given the soaring popularity of big data technology, organisations are profoundly attracted to and interested in it to transform their organisation by improving their businesses. Big data is enabling organisations to outpace their competitors and save cost. Similarly, the application of Big Data management in Universities is an essential aspect to institutions that have Big Data to manage; as the use of Big Data in the higher education sector is increasing day by day. Many studies have been carried out on big data and analytics with little interest in its management. Big Data management is a reality that represents a set of challenges involving Big Data modeling, storage, and retrieval, analysis, and visualization for several areas in organizations. This paper introduces and contributes to the conceptual and theoretical understanding of Big Data management within higher education as it outlines its relevance to higher education institutions. It describes the opportunities this growing research area brings to higher education as well as major challenges associated with it

    Characterizing and Subsetting Big Data Workloads

    Full text link
    Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks poses great challenges for the architecture community. First, we need to thoroughly understand the behaviors of a variety of workloads. Second, our usual simulation-based research methods become prohibitively expensive for big data. As big data is an emerging field, more and more software stacks are being proposed to facilitate the development of big data applications, which aggravates hese challenges. In this paper, we first use Principle Component Analysis (PCA) to identify the most important characteristics from 45 metrics to characterize big data workloads from BigDataBench, a comprehensive big data benchmark suite. Second, we apply a clustering technique to the principle components obtained from the PCA to investigate the similarity among big data workloads, and we verify the importance of including different software stacks for big data benchmarking. Third, we select seven representative big data workloads by removing redundant ones and release the BigDataBench simulation version, which is publicly available from http://prof.ict.ac.cn/BigDataBench/simulatorversion/.Comment: 11 pages, 6 figures, 2014 IEEE International Symposium on Workload Characterizatio
    • …
    corecore