807,670 research outputs found

    Characterizing and Subsetting Big Data Workloads

    Full text link
    Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks poses great challenges for the architecture community. First, we need to thoroughly understand the behaviors of a variety of workloads. Second, our usual simulation-based research methods become prohibitively expensive for big data. As big data is an emerging field, more and more software stacks are being proposed to facilitate the development of big data applications, which aggravates hese challenges. In this paper, we first use Principle Component Analysis (PCA) to identify the most important characteristics from 45 metrics to characterize big data workloads from BigDataBench, a comprehensive big data benchmark suite. Second, we apply a clustering technique to the principle components obtained from the PCA to investigate the similarity among big data workloads, and we verify the importance of including different software stacks for big data benchmarking. Third, we select seven representative big data workloads by removing redundant ones and release the BigDataBench simulation version, which is publicly available from http://prof.ict.ac.cn/BigDataBench/simulatorversion/.Comment: 11 pages, 6 figures, 2014 IEEE International Symposium on Workload Characterizatio

    Selection of Statistical Software for Solving Big Data Problems: A Guide for Businesses, Students, and Universities

    Get PDF
    The need for analysts with expertise in big data software is becoming more apparent in today’s society. Unfortunately, the demand for these analysts far exceeds the number available. A potential way to combat this shortage is to identify the software taught in colleges or universities. This article will examine four data analysis software—Excel add-ins, SPSS, SAS, and R—and we will outline the cost, training, and statistical methods/tests/uses for each of these software. It will further explain implications for universities and future students

    Selection of Statistical Software for Solving Big Data Problems for Teaching

    Get PDF
    The need for analysts with expertise in big data software is becoming more apparent in 4 today’s society. Unfortunately, the demand for these analysts far exceeds the number 5 available. A potential way to combat this shortage is to identify the software sought by 6 employers and to align this with the software taught by universities. This paper will 7 examine multiple data analysis software – Excel add-ins, SPSS, SAS, Minitab, and R – and 8 it will outline the cost, training, statistical methods/tests/uses, and specific uses within 9 industry for each of these software. It will further explain implications for universities and 10 students (PDF

    The last five years of Big Data Research in Economics, Econometrics and Finance: Identification and conceptual analysis

    Get PDF
    Today, the Big Data term has a multidimensional approach where five main characteristics stand out: volume, velocity, veracity, value and variety. It has changed from being an emerging theme to a growing research area. In this respect, this study analyses the literature on Big Data in the Economics, Econometrics and Finance field. To do that, 1.034 publications from 2015 to 2019 were evaluated using SciMAT as a bibliometric and network analysis software. SciMAT offers a complete approach of the field and evaluates the most cited and productive authors, countries and subject areas related to Big Data. Lastly, a science map is performed to understand the intellectual structure and the main research lines (themes)

    Testing in Big Data: An Architecture Pattern for a Development Environment for Innovative, Integrated and Robust Applications

    Get PDF
    Big Data is a crucial pillar for many of today’s newly emerging business models. Areas of application range from consumer analysis over medicine to fraud detection. All of those domains require reliable software. Even though imperfect results are accepted in Big Data software, bugs and other defects can have drastic consequences. Therefore, in this paper, the software engineering sub discipline of testing is addressed. Big Data exhibits characteristics which differentiate its processing software from those that process traditional workloads. Consequently, an architecture pattern for testing that can be integrated into development environments for Big Data software is proposed. The paper features a detailed description of the artifact as well as a preliminary plan for evaluation
    • …
    corecore