1,093,078 research outputs found
Big data. Big challenges
Una quindicina di anni fa Doug Laney (2001) si trovò a definire uno scenario
emergente in cui – grazie alla crescente potenza di calcolo delle macchine – grandi mole
di dati potevano essere messe insieme ed analizzate per rispondere più efficacemente alle
nostre domande. Laney, senza nemmeno utilizzare il concetto di big data, evidenziava nel
Volume (la massa dei dati), nella Velocity (di creazione e trasmissione) e nella Variety (delle
fonti di informazioni) le caratteristiche costitutive di queste nuove grandi basi di dati. Solo
di recente, alle ormai famose 3V se ne è aggiunta un’altra, la Veracity, vale a dire la qualitÃ
dei dati. Questo significa che l’inclusione nelle analisi empiriche di base di dati eterogenee,
anche se grandi, comunque solleva domande sulla completezza e l’accuratezza dei dati
raccolti. Tanto più se questi vengono restituiti al pubblico in forma di visualizzazioni ed
infografiche più o meno spettacolari.
L’uso di questi dispositivi per finalità di comunicazione è ormai enorme. In
politica possono servire come strumento di fact-checking ad uso e consumo dell’opinione
pubblica o come forme di persuasione più o meno occulta, per dare legittimità a politiche
cosiddette evidence-based o per strategie di profilazione utenti.
Le risorse che presentiamo in questa rubrica vanno interpretate come esempi
d’uso, manifestazioni della potenza di calcolo da un lato e dell’idea – probabilmente sbagliata
– che avere a che fare con i dati incrementi in qualche modo la nostra capacità di
scelta razionale
Challenges of Big Data Analysis
Big Data bring new opportunities to modern society and challenges to data
scientists. On one hand, Big Data hold great promises for discovering subtle
population patterns and heterogeneities that are not possible with small-scale
data. On the other hand, the massive sample size and high dimensionality of Big
Data introduce unique computational and statistical challenges, including
scalability and storage bottleneck, noise accumulation, spurious correlation,
incidental endogeneity, and measurement errors. These challenges are
distinguished and require new computational and statistical paradigm. This
article give overviews on the salient features of Big Data and how these
features impact on paradigm change on statistical and computational methods as
well as computing architectures. We also provide various new perspectives on
the Big Data analysis and computation. In particular, we emphasis on the
viability of the sparsest solution in high-confidence set and point out that
exogeneous assumptions in most statistical methods for Big Data can not be
validated due to incidental endogeneity. They can lead to wrong statistical
inferences and consequently wrong scientific conclusions
Big Data Management in Education Sector: an Overview
The advancement in technological innovation has given rise to a new trend known as Big Data today. Given the soaring popularity of big data technology, organisations are profoundly attracted to and interested in it to transform their organisation by improving their businesses. Big data is enabling organisations to outpace their competitors and save cost. Similarly, the application of Big Data management in Universities is an essential aspect to institutions that have Big Data to manage; as the use of Big Data in the higher education sector is increasing day by day. Many studies have been carried out on big data and analytics with little interest in its management. Big Data management is a reality that represents a set of challenges involving Big Data modeling, storage, and retrieval, analysis, and visualization for several areas in organizations. This paper introduces and contributes to the conceptual and theoretical understanding of Big Data management within higher education as it outlines its relevance to higher education institutions. It describes the opportunities this growing research area brings to higher education as well as major challenges associated with it
Big Data Management in Education Sector: an Overview
The advancement in technological innovation has given rise to a new trend known as Big Data today. Given the soaring popularity of big data technology, organisations are profoundly attracted to and interested in it to transform their organisation by improving their businesses. Big data is enabling organisations to outpace their competitors and save cost. Similarly, the application of Big Data management in Universities is an essential aspect to institutions that have Big Data to manage; as the use of Big Data in the higher education sector is increasing day by day. Many studies have been carried out on big data and analytics with little interest in its management. Big Data management is a reality that represents a set of challenges involving Big Data modeling, storage, and retrieval, analysis, and visualization for several areas in organizations. This paper introduces and contributes to the conceptual and theoretical understanding of Big Data management within higher education as it outlines its relevance to higher education institutions. It describes the opportunities this growing research area brings to higher education as well as major challenges associated with it
Characterizing and Subsetting Big Data Workloads
Big data benchmark suites must include a diversity of data and workloads to
be useful in fairly evaluating big data systems and architectures. However,
using truly comprehensive benchmarks poses great challenges for the
architecture community. First, we need to thoroughly understand the behaviors
of a variety of workloads. Second, our usual simulation-based research methods
become prohibitively expensive for big data. As big data is an emerging field,
more and more software stacks are being proposed to facilitate the development
of big data applications, which aggravates hese challenges. In this paper, we
first use Principle Component Analysis (PCA) to identify the most important
characteristics from 45 metrics to characterize big data workloads from
BigDataBench, a comprehensive big data benchmark suite. Second, we apply a
clustering technique to the principle components obtained from the PCA to
investigate the similarity among big data workloads, and we verify the
importance of including different software stacks for big data benchmarking.
Third, we select seven representative big data workloads by removing redundant
ones and release the BigDataBench simulation version, which is publicly
available from http://prof.ict.ac.cn/BigDataBench/simulatorversion/.Comment: 11 pages, 6 figures, 2014 IEEE International Symposium on Workload
Characterizatio
- …