Search CORE

807,670 research outputs found

Characterizing and Subsetting Big Data Workloads

Author: Han Rui
Jia Zhen
Li Jingwei
Luo Chunjie
McKee Sally A.
Wang Lei
Yang Qiang
Zhan Jianfeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks poses great challenges for the architecture community. First, we need to thoroughly understand the behaviors of a variety of workloads. Second, our usual simulation-based research methods become prohibitively expensive for big data. As big data is an emerging field, more and more software stacks are being proposed to facilitate the development of big data applications, which aggravates hese challenges. In this paper, we first use Principle Component Analysis (PCA) to identify the most important characteristics from 45 metrics to characterize big data workloads from BigDataBench, a comprehensive big data benchmark suite. Second, we apply a clustering technique to the principle components obtained from the PCA to investigate the similarity among big data workloads, and we verify the importance of including different software stacks for big data benchmarking. Third, we select seven representative big data workloads by removing redundant ones and release the BigDataBench simulation version, which is publicly available from http://prof.ict.ac.cn/BigDataBench/simulatorversion/.Comment: 11 pages, 6 figures, 2014 IEEE International Symposium on Workload Characterizatio

arXiv.org e-Print Archive

Crossref

Chalmers Research

Selection of Statistical Software for Solving Big Data Problems: A Guide for Businesses, Students, and Universities

Author: Kleckner Michelle
Li Yang
Ozgur Ceyhun
Publication venue: ValpoScholar
Publication date: 01/05/2015
Field of study

The need for analysts with expertise in big data software is becoming more apparent in today’s society. Unfortunately, the demand for these analysts far exceeds the number available. A potential way to combat this shortage is to identify the software taught in colleges or universities. This article will examine four data analysis software—Excel add-ins, SPSS, SAS, and R—and we will outline the cost, training, and statistical methods/tests/uses for each of these software. It will further explain implications for universities and future students

Valparaiso University

Recommended from our members

Comparative Analysis of Big Data Analytics Software in Assessing Sample Data

Author: Biju Soly Mathew
Mathew Alex
Publication venue: CSUSB ScholarWorks
Publication date: 01/06/2017
Field of study

Over the last few years, big data has emerged as an important topic of discussion in most firms owing to its ability of creation, storage and processing of content at a reasonable price. Big data consists of advanced tools and techniques to process large volumes of data in organisations. Investment in big data analytics has almost become a necessity in large-sized firms, particularly multinational companies, for its unique benefits, particularly in prediction and identification of various trends. Some of the most popular big data analytics software used today are MapReduce, Hive, Tableau and Hive, while the framework Hadoop enables easy processing of such extremely large data sets. The current research attempts to create a comparative assessment of five such applications namely IBM SPSS, IBM Watson Analytics, R, Minitab and SAS. The case taken into effect for the test was that of the factors affecting housing affordability in the US. Based on the statistics obtained from the American Housing Survey (AHS) database, the researcher has identified different factors impacting the affordability in the states. The technique of reducing variables though Principal Component Analysis (PCA) and a model based on partial least square regression/polynomial regression was fitted to check the impact on the affordability. The primary findings suggest that majorly age of the head of the household, income earned were the two most important factors affecting the pricing in the region. Also, a comparison is drawn at the end of study with interpretation of the most and least effective applications

CSUSB ScholarWorks

Selection of Statistical Software for Solving Big Data Problems for Teaching

Author: Dou Min
Li Yang
Ozgur Ceyhun
Rogers Grace
Publication venue: ValpoScholar
Publication date: 01/04/2017
Field of study

The need for analysts with expertise in big data software is becoming more apparent in 4 today’s society. Unfortunately, the demand for these analysts far exceeds the number 5 available. A potential way to combat this shortage is to identify the software sought by 6 employers and to align this with the software taught by universities. This paper will 7 examine multiple data analysis software – Excel add-ins, SPSS, SAS, Minitab, and R – and 8 it will outline the cost, training, statistical methods/tests/uses, and specific uses within 9 industry for each of these software. It will further explain implications for universities and 10 students (PDF

Valparaiso University

The last five years of Big Data Research in Economics, Econometrics and Finance: Identification and conceptual analysis

Author: Cobo Martín Manuel Jesús
Gamboa-Rosales Nadia Karina
López-Robles José Ricardo
Ramirez-Rosales Selene
Rodríguez-Salvador Marisela
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Today, the Big Data term has a multidimensional approach where five main characteristics stand out: volume, velocity, veracity, value and variety. It has changed from being an emerging theme to a growing research area. In this respect, this study analyses the literature on Big Data in the Economics, Econometrics and Finance field. To do that, 1.034 publications from 2015 to 2019 were evaluated using SciMAT as a bibliometric and network analysis software. SciMAT offers a complete approach of the field and evaluates the most cited and productive authors, countries and subject areas related to Big Data. Lastly, a science map is performed to understand the intellectual structure and the main research lines (themes)

E-LIS

Repositorio de Objetos de Docencia e Investigación de la Universidad de Cádiz

Testing in Big Data: An Architecture Pattern for a Development Environment for Innovative, Integrated and Robust Applications

Author: Hintsch Johannes
Staegemann Daniel
Turowski Klaus
Publication venue: AIS Electronic Library (AISeL)
Publication date: 28/02/2019
Field of study

Big Data is a crucial pillar for many of today’s newly emerging business models. Areas of application range from consumer analysis over medicine to fraud detection. All of those domains require reliable software. Even though imperfect results are accepted in Big Data software, bugs and other defects can have drastic consequences. Therefore, in this paper, the software engineering sub discipline of testing is addressed. Big Data exhibits characteristics which differentiate its processing software from those that process traditional workloads. Consequently, an architecture pattern for testing that can be integrated into development environments for Big Data software is proposed. The paper features a detailed description of the artifact as well as a preliminary plan for evaluation

AIS Electronic Library (AISeL)