920,709 research outputs found
BigDataBench: a Big Data Benchmark Suite from Internet Services
As architecture, systems, and data management communities pay greater
attention to innovative big data systems and architectures, the pressure of
benchmarking and evaluating these systems rises. Considering the broad use of
big data systems, big data benchmarks must include diversity of data and
workloads. Most of the state-of-the-art big data benchmarking efforts target
evaluating specific types of applications or system software stacks, and hence
they are not qualified for serving the purposes mentioned above. This paper
presents our joint research efforts on this issue with several industrial
partners. Our big data benchmark suite BigDataBench not only covers broad
application scenarios, but also includes diverse and representative data sets.
BigDataBench is publicly available from http://prof.ict.ac.cn/BigDataBench .
Also, we comprehensively characterize 19 big data workloads included in
BigDataBench with varying data inputs. On a typical state-of-practice
processor, Intel Xeon E5645, we have the following observations: First, in
comparison with the traditional benchmarks: including PARSEC, HPCC, and
SPECCPU, big data applications have very low operation intensity; Second, the
volume of data input has non-negligible impact on micro-architecture
characteristics, which may impose challenges for simulation-based big data
architecture research; Last but not least, corroborating the observations in
CloudSuite and DCBench (which use smaller data inputs), we find that the
numbers of L1 instruction cache misses per 1000 instructions of the big data
applications are higher than in the traditional benchmarks; also, we find that
L3 caches are effective for the big data applications, corroborating the
observation in DCBench.Comment: 12 pages, 6 figures, The 20th IEEE International Symposium On High
Performance Computer Architecture (HPCA-2014), February 15-19, 2014, Orlando,
Florida, US
Cost-effective Big Data Mining in the Cloud: A Case Study with K-means
Mining big data often requires tremendous computationalresources. This has become a major obstacle to broad applicationsof big data analytics. Cloud computing allows data scientists to access computationalresources on-demand for building their big data analytics solutions in the cloud
Muppet: MapReduce-Style Processing of Fast Data
MapReduce has emerged as a popular method to process big data. In the past
few years, however, not just big data, but fast data has also exploded in
volume and availability. Examples of such data include sensor data streams, the
Twitter Firehose, and Facebook updates. Numerous applications must process fast
data. Can we provide a MapReduce-style framework so that developers can quickly
write such applications and execute them over a cluster of machines, to achieve
low latency and high scalability? In this paper we report on our investigation
of this question, as carried out at Kosmix and WalmartLabs. We describe
MapUpdate, a framework like MapReduce, but specifically developed for fast
data. We describe Muppet, our implementation of MapUpdate. Throughout the
description we highlight the key challenges, argue why MapReduce is not well
suited to address them, and briefly describe our current solutions. Finally, we
describe our experience and lessons learned with Muppet, which has been used
extensively at Kosmix and WalmartLabs to power a broad range of applications in
social media and e-commerce.Comment: VLDB201
Recommended from our members
Marketing and Data Science: Together the Future is Ours
The synergistic use of computer science and marketing science techniques offers the best avenue for knowledge development and improved applications. A broad area of complementarity between the typical focus in statistics and computer science and that in marketing offers great potential. The former fields tend to focus on pattern recognition, control and prediction. Many marketing analyses embrace these directions, but also contribute by modeling structure and exploring causal relationships. Marketing has successfully combined foci from management science with foci from psychology and economics. These fields complement each other because they enable a broad spectrum of scientific approaches. Combined, they provide both understanding and practical solutions to important and relevant managerial marketing problems, and marketing science is already very successful at obtaining unique insights from big data
Big data: Some statistical issues.
A broad review is given of the impact of big data on various aspects of investigation. There is some but not total emphasis on issues in epidemiological research
Personality Traits, Self-Employment, and Professions
We investigate the effect of broad personality traits-the Big Five-on an individual's decision to become self- employed. In particular, we test an overall indicator of the entrepreneurial personality. Since we find that the level of self-employment varies considerably across professions, we also perform the analysis for different types of professions, namely, those classified as being in the "creative class" as compared to the noncreative class. The analysis is based on micro data for individuals of the German Socio Economic Panel (SOEP). We find a significant association between personality traits and the propensity be become self-employed. However, the strength of this link is fairly weak and differs across professions, indicating an important effect of an individual's profession on his or her decision to run an own business.Entrepreneurship, self-employment, personality traits, the Big Five, professions
Personality Traits, Self-Employment, and Professions
We investigate the effect of broad personality traits - the Big Five - on an individual's decision to become self-employed. In particular, we test an overall indicator of the entrepreneurial personality. Since we find that the level of selfemployment varies considerably across professions, we also perform the analysis for different types of professions, namely, those classified as being in the "creative class" as compared to the noncreative class. The analysis is based on micro data for individuals of the German Socio Economic Panel (SOEP). We find a significant association between personality traits and the propensity be become self-employed. However, the strengthof this link is fairly weak and differs across professions, indicating an important effect of an individual's profession on his or her decision to run an own business.Entrepreneurship, self-employment, personality traits, the Big Five, professions
Teaching Big Data Management – An Active Learning Approach for Higher Education
Since big data analytics has become an imperative for business success in the digital economy, universities face the challenge to train data scientists and data engineers on various technological and managerial skills. In addition to traditional lectures, active learning formats ensure a practice-oriented education enabling students to handle novel big data technologies. In this paper, we present a big data management syllabus for master students in the field of big data analytics, which includes various hands-on and action learning elements. The course encompasses seven lectures and nine tutorials and takes place at Chemnitz University of Technology. It covers a broad range of big data applications and facilitates knowledge on various cognitive levels. The paper gives an overview of the course content and assigns learning objectives to lectures and tutorials using Krathwohl’s revised taxonomy. Finally, we present the feedback, which we have received by the students over the years
Cloud Computing and Big Data for Oil and Gas Industry Application in China
The oil and gas industry is a complex data-driven industry with compute-intensive, data-intensive and business-intensive features. Cloud computing and big data have a broad application prospect in the oil and gas industry. This research aims to highlight the cloud computing and big data issues and challenges from the informatization in oil and gas industry. In this paper, the distributed cloud storage architecture and its applications for seismic data of oil and gas industry are focused on first. Then,cloud desktop for oil and gas industry applications are also introduced in terms of efficiency, security and usability. Finally, big data architecture and security issues of oil and gas industry are analyzed. Cloud computing and big data architectures have advantages in many aspects, such as system scalability, reliability, and serviceability. This paper also provides a brief description for the future development of Cloud computing and big data in oil and gas industry. Cloud computing and big data can provide convenient information sharing and high quality service for oil and gas industry
- …