807,670 research outputs found
Characterizing and Subsetting Big Data Workloads
Big data benchmark suites must include a diversity of data and workloads to
be useful in fairly evaluating big data systems and architectures. However,
using truly comprehensive benchmarks poses great challenges for the
architecture community. First, we need to thoroughly understand the behaviors
of a variety of workloads. Second, our usual simulation-based research methods
become prohibitively expensive for big data. As big data is an emerging field,
more and more software stacks are being proposed to facilitate the development
of big data applications, which aggravates hese challenges. In this paper, we
first use Principle Component Analysis (PCA) to identify the most important
characteristics from 45 metrics to characterize big data workloads from
BigDataBench, a comprehensive big data benchmark suite. Second, we apply a
clustering technique to the principle components obtained from the PCA to
investigate the similarity among big data workloads, and we verify the
importance of including different software stacks for big data benchmarking.
Third, we select seven representative big data workloads by removing redundant
ones and release the BigDataBench simulation version, which is publicly
available from http://prof.ict.ac.cn/BigDataBench/simulatorversion/.Comment: 11 pages, 6 figures, 2014 IEEE International Symposium on Workload
Characterizatio
Selection of Statistical Software for Solving Big Data Problems: A Guide for Businesses, Students, and Universities
The need for analysts with expertise in big data software is becoming more apparent in today’s society. Unfortunately, the demand for these analysts far exceeds the number available. A potential way to combat this shortage is to identify the software taught in colleges or universities. This article will examine four data analysis software—Excel add-ins, SPSS, SAS, and R—and we will outline the cost, training, and statistical methods/tests/uses for each of these software. It will further explain implications for universities and future students
Recommended from our members
Comparative Analysis of Big Data Analytics Software in Assessing Sample Data
Over the last few years, big data has emerged as an important topic of discussion in most firms owing to its ability of creation, storage and processing of content at a reasonable price. Big data consists of advanced tools and techniques to process large volumes of data in organisations. Investment in big data analytics has almost become a necessity in large-sized firms, particularly multinational companies, for its unique benefits, particularly in prediction and identification of various trends. Some of the most popular big data analytics software used today are MapReduce, Hive, Tableau and Hive, while the framework Hadoop enables easy processing of such extremely large data sets. The current research attempts to create a comparative assessment of five such applications namely IBM SPSS, IBM Watson Analytics, R, Minitab and SAS. The case taken into effect for the test was that of the factors affecting housing affordability in the US. Based on the statistics obtained from the American Housing Survey (AHS) database, the researcher has identified different factors impacting the affordability in the states. The technique of reducing variables though Principal Component Analysis (PCA) and a model based on partial least square regression/polynomial regression was fitted to check the impact on the affordability. The primary findings suggest that majorly age of the head of the household, income earned were the two most important factors affecting the pricing in the region. Also, a comparison is drawn at the end of study with interpretation of the most and least effective applications
Selection of Statistical Software for Solving Big Data Problems for Teaching
The need for analysts with expertise in big data software is becoming more apparent in 4 today’s society. Unfortunately, the demand for these analysts far exceeds the number 5 available. A potential way to combat this shortage is to identify the software sought by 6 employers and to align this with the software taught by universities. This paper will 7 examine multiple data analysis software – Excel add-ins, SPSS, SAS, Minitab, and R – and 8 it will outline the cost, training, statistical methods/tests/uses, and specific uses within 9 industry for each of these software. It will further explain implications for universities and 10 students (PDF
The last five years of Big Data Research in Economics, Econometrics and Finance: Identification and conceptual analysis
Today, the Big Data term has a multidimensional approach where five main characteristics stand out: volume, velocity, veracity, value and variety. It has changed from being an emerging theme to a growing research area. In this respect, this study analyses the literature on Big Data in the Economics, Econometrics and Finance field. To do that, 1.034 publications from 2015 to 2019 were evaluated using SciMAT as a bibliometric and network analysis software. SciMAT offers a complete approach of the field and evaluates the most cited and productive authors, countries and subject areas related to Big Data. Lastly, a science map is performed to understand the intellectual structure and the main research lines (themes)
Testing in Big Data: An Architecture Pattern for a Development Environment for Innovative, Integrated and Robust Applications
Big Data is a crucial pillar for many of today’s newly emerging business models. Areas of application range from consumer analysis over medicine to fraud detection. All of those domains require reliable software. Even though imperfect results are accepted in Big Data software, bugs and other defects can have drastic consequences. Therefore, in this paper, the software engineering sub discipline of testing is addressed. Big Data exhibits characteristics which differentiate its processing software from those that process traditional workloads. Consequently, an architecture pattern for testing that can be integrated into development environments for Big Data software is proposed. The paper features a detailed description of the artifact as well as a preliminary plan for evaluation
- …