Search CORE

920,709 research outputs found

BigDataBench: a Big Data Benchmark Suite from Internet Services

Author: Gao Wanling
He Yongqiang
Jia Zhen
Li Xiaona
Lu Gang
Luo Chunjie
Qiu Bizhu
Shi Yingjie
Wang Lei
Yang Qiang
Zhan Jianfeng
Zhan Kent
Zhang Shujie
Zheng Chen
Zhu Yuqing
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/02/2014
Field of study

As architecture, systems, and data management communities pay greater attention to innovative big data systems and architectures, the pressure of benchmarking and evaluating these systems rises. Considering the broad use of big data systems, big data benchmarks must include diversity of data and workloads. Most of the state-of-the-art big data benchmarking efforts target evaluating specific types of applications or system software stacks, and hence they are not qualified for serving the purposes mentioned above. This paper presents our joint research efforts on this issue with several industrial partners. Our big data benchmark suite BigDataBench not only covers broad application scenarios, but also includes diverse and representative data sets. BigDataBench is publicly available from http://prof.ict.ac.cn/BigDataBench . Also, we comprehensively characterize 19 big data workloads included in BigDataBench with varying data inputs. On a typical state-of-practice processor, Intel Xeon E5645, we have the following observations: First, in comparison with the traditional benchmarks: including PARSEC, HPCC, and SPECCPU, big data applications have very low operation intensity; Second, the volume of data input has non-negligible impact on micro-architecture characteristics, which may impose challenges for simulation-based big data architecture research; Last but not least, corroborating the observations in CloudSuite and DCBench (which use smaller data inputs), we find that the numbers of L1 instruction cache misses per 1000 instructions of the big data applications are higher than in the traditional benchmarks; also, we find that L3 caches are effective for the big data applications, corroborating the observation in DCBench.Comment: 12 pages, 6 figures, The 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), February 15-19, 2014, Orlando, Florida, US

arXiv.org e-Print Archive

Cost-effective Big Data Mining in the Cloud: A Case Study with K-means

Author: He Qiang
Li Dongwei
Shen Jun
Wang Shuliang
Yang Yun
Zhu Xiaodong
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2017
Field of study

Mining big data often requires tremendous computationalresources. This has become a major obstacle to broad applicationsof big data analytics. Cloud computing allows data scientists to access computationalresources on-demand for building their big data analytics solutions in the cloud

Research Online

Muppet: MapReduce-Style Processing of Fast Data

Author: Doan AnHai
Lam Wang
Liu Lu
Prasad STS
Rajaraman Anand
Vacheri Zoheb
Publication venue
Publication date: 01/01/2012
Field of study

MapReduce has emerged as a popular method to process big data. In the past few years, however, not just big data, but fast data has also exploded in volume and availability. Examples of such data include sensor data streams, the Twitter Firehose, and Facebook updates. Numerous applications must process fast data. Can we provide a MapReduce-style framework so that developers can quickly write such applications and execute them over a cluster of machines, to achieve low latency and high scalability? In this paper we report on our investigation of this question, as carried out at Kosmix and WalmartLabs. We describe MapUpdate, a framework like MapReduce, but specifically developed for fast data. We describe Muppet, our implementation of MapUpdate. Throughout the description we highlight the key challenges, argue why MapReduce is not well suited to address them, and briefly describe our current solutions. Finally, we describe our experience and lessons learned with Muppet, which has been used extensively at Kosmix and WalmartLabs to power a broad range of applications in social media and e-commerce.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Marketing and Data Science: Together the Future is Ours

Author: Chintagunta Pradeep
Hanssens Dominique M
Hauser John R
Publication venue: eScholarship, University of California
Publication date: 01/11/2016
Field of study

The synergistic use of computer science and marketing science techniques offers the best avenue for knowledge development and improved applications. A broad area of complementarity between the typical focus in statistics and computer science and that in marketing offers great potential. The former fields tend to focus on pattern recognition, control and prediction. Many marketing analyses embrace these directions, but also contribute by modeling structure and exploring causal relationships. Marketing has successfully combined foci from management science with foci from psychology and economics. These fields complement each other because they enable a broad spectrum of scientific approaches. Combined, they provide both understanding and practical solutions to important and relevant managerial marketing problems, and marketing science is already very successful at obtaining unique insights from big data

eScholarship - University of California

Big data: Some statistical issues.

Author: Cox DR
Kartsonaki Christiana
Keogh Ruth H
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

A broad review is given of the impact of big data on various aspects of investigation. There is some but not total emphasis on issues in epidemiological research

Oxford University Research Archive

Personality Traits, Self-Employment, and Professions

Author: Alina Rusakova
Michael Fritsch
Publication venue
Publication date
Field of study

We investigate the effect of broad personality traits-the Big Five-on an individual's decision to become self- employed. In particular, we test an overall indicator of the entrepreneurial personality. Since we find that the level of self-employment varies considerably across professions, we also perform the analysis for different types of professions, namely, those classified as being in the "creative class" as compared to the noncreative class. The analysis is based on micro data for individuals of the German Socio Economic Panel (SOEP). We find a significant association between personality traits and the propensity be become self-employed. However, the strength of this link is fairly weak and differs across professions, indicating an important effect of an individual's profession on his or her decision to run an own business.Entrepreneurship, self-employment, personality traits, the Big Five, professions

Personality Traits, Self-Employment, and Professions

Author: Alina Rusakova
Michael Fritsch
Publication venue
Publication date
Field of study

We investigate the effect of broad personality traits - the Big Five - on an individual's decision to become self-employed. In particular, we test an overall indicator of the entrepreneurial personality. Since we find that the level of selfemployment varies considerably across professions, we also perform the analysis for different types of professions, namely, those classified as being in the "creative class" as compared to the noncreative class. The analysis is based on micro data for individuals of the German Socio Economic Panel (SOEP). We find a significant association between personality traits and the propensity be become self-employed. However, the strengthof this link is fairly weak and differs across professions, indicating an important effect of an individual's profession on his or her decision to run an own business.Entrepreneurship, self-employment, personality traits, the Big Five, professions

Teaching Big Data Management – An Active Learning Approach for Higher Education

Author: Dinter Barbara
Jaekel Tobias
Kollwitz Christoph
Wache Hendrik
Publication venue: AIS Electronic Library (AISeL)
Publication date: 10/12/2017
Field of study

Since big data analytics has become an imperative for business success in the digital economy, universities face the challenge to train data scientists and data engineers on various technological and managerial skills. In addition to traditional lectures, active learning formats ensure a practice-oriented education enabling students to handle novel big data technologies. In this paper, we present a big data management syllabus for master students in the field of big data analytics, which includes various hands-on and action learning elements. The course encompasses seven lectures and nine tutorials and takes place at Chemnitz University of Technology. It covers a broad range of big data applications and facilitates knowledge on various cognitive levels. The paper gives an overview of the course content and assigns learning objectives to lectures and tutorials using Krathwohl’s revised taxonomy. Finally, we present the feedback, which we have received by the students over the years

AIS Electronic Library (AISeL)

Cloud Computing and Big Data for Oil and Gas Industry Application in China

Author: Fei Han
Qi Yuan
Xuehui Feng
Yidan Zhang
Zhen Cao
Zhifeng Yang
Publication venue
Publication date: 01/01/2019
Field of study

The oil and gas industry is a complex data-driven industry with compute-intensive, data-intensive and business-intensive features. Cloud computing and big data have a broad application prospect in the oil and gas industry. This research aims to highlight the cloud computing and big data issues and challenges from the informatization in oil and gas industry. In this paper, the distributed cloud storage architecture and its applications for seismic data of oil and gas industry are focused on first. Then,cloud desktop for oil and gas industry applications are also introduced in terms of efficiency, security and usability. Finally, big data architecture and security issues of oil and gas industry are analyzed. Cloud computing and big data architectures have advantages in many aspects, such as system scalability, reliability, and serviceability. This paper also provides a brief description for the future development of Cloud computing and big data in oil and gas industry. Cloud computing and big data can provide convenient information sharing and high quality service for oil and gas industry

PhilPapers