920,709 research outputs found

    BigDataBench: a Big Data Benchmark Suite from Internet Services

    Full text link
    As architecture, systems, and data management communities pay greater attention to innovative big data systems and architectures, the pressure of benchmarking and evaluating these systems rises. Considering the broad use of big data systems, big data benchmarks must include diversity of data and workloads. Most of the state-of-the-art big data benchmarking efforts target evaluating specific types of applications or system software stacks, and hence they are not qualified for serving the purposes mentioned above. This paper presents our joint research efforts on this issue with several industrial partners. Our big data benchmark suite BigDataBench not only covers broad application scenarios, but also includes diverse and representative data sets. BigDataBench is publicly available from http://prof.ict.ac.cn/BigDataBench . Also, we comprehensively characterize 19 big data workloads included in BigDataBench with varying data inputs. On a typical state-of-practice processor, Intel Xeon E5645, we have the following observations: First, in comparison with the traditional benchmarks: including PARSEC, HPCC, and SPECCPU, big data applications have very low operation intensity; Second, the volume of data input has non-negligible impact on micro-architecture characteristics, which may impose challenges for simulation-based big data architecture research; Last but not least, corroborating the observations in CloudSuite and DCBench (which use smaller data inputs), we find that the numbers of L1 instruction cache misses per 1000 instructions of the big data applications are higher than in the traditional benchmarks; also, we find that L3 caches are effective for the big data applications, corroborating the observation in DCBench.Comment: 12 pages, 6 figures, The 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), February 15-19, 2014, Orlando, Florida, US

    Cost-effective Big Data Mining in the Cloud: A Case Study with K-means

    Get PDF
    Mining big data often requires tremendous computationalresources. This has become a major obstacle to broad applicationsof big data analytics. Cloud computing allows data scientists to access computationalresources on-demand for building their big data analytics solutions in the cloud

    Muppet: MapReduce-Style Processing of Fast Data

    Full text link
    MapReduce has emerged as a popular method to process big data. In the past few years, however, not just big data, but fast data has also exploded in volume and availability. Examples of such data include sensor data streams, the Twitter Firehose, and Facebook updates. Numerous applications must process fast data. Can we provide a MapReduce-style framework so that developers can quickly write such applications and execute them over a cluster of machines, to achieve low latency and high scalability? In this paper we report on our investigation of this question, as carried out at Kosmix and WalmartLabs. We describe MapUpdate, a framework like MapReduce, but specifically developed for fast data. We describe Muppet, our implementation of MapUpdate. Throughout the description we highlight the key challenges, argue why MapReduce is not well suited to address them, and briefly describe our current solutions. Finally, we describe our experience and lessons learned with Muppet, which has been used extensively at Kosmix and WalmartLabs to power a broad range of applications in social media and e-commerce.Comment: VLDB201

    Big data: Some statistical issues.

    Get PDF
    A broad review is given of the impact of big data on various aspects of investigation. There is some but not total emphasis on issues in epidemiological research

    Personality Traits, Self-Employment, and Professions

    Get PDF
    We investigate the effect of broad personality traits-the Big Five-on an individual's decision to become self- employed. In particular, we test an overall indicator of the entrepreneurial personality. Since we find that the level of self-employment varies considerably across professions, we also perform the analysis for different types of professions, namely, those classified as being in the "creative class" as compared to the noncreative class. The analysis is based on micro data for individuals of the German Socio Economic Panel (SOEP). We find a significant association between personality traits and the propensity be become self-employed. However, the strength of this link is fairly weak and differs across professions, indicating an important effect of an individual's profession on his or her decision to run an own business.Entrepreneurship, self-employment, personality traits, the Big Five, professions

    Personality Traits, Self-Employment, and Professions

    Get PDF
    We investigate the effect of broad personality traits - the Big Five - on an individual's decision to become self-employed. In particular, we test an overall indicator of the entrepreneurial personality. Since we find that the level of selfemployment varies considerably across professions, we also perform the analysis for different types of professions, namely, those classified as being in the "creative class" as compared to the noncreative class. The analysis is based on micro data for individuals of the German Socio Economic Panel (SOEP). We find a significant association between personality traits and the propensity be become self-employed. However, the strengthof this link is fairly weak and differs across professions, indicating an important effect of an individual's profession on his or her decision to run an own business.Entrepreneurship, self-employment, personality traits, the Big Five, professions

    Teaching Big Data Management – An Active Learning Approach for Higher Education

    Get PDF
    Since big data analytics has become an imperative for business success in the digital economy, universities face the challenge to train data scientists and data engineers on various technological and managerial skills. In addition to traditional lectures, active learning formats ensure a practice-oriented education enabling students to handle novel big data technologies. In this paper, we present a big data management syllabus for master students in the field of big data analytics, which includes various hands-on and action learning elements. The course encompasses seven lectures and nine tutorials and takes place at Chemnitz University of Technology. It covers a broad range of big data applications and facilitates knowledge on various cognitive levels. The paper gives an overview of the course content and assigns learning objectives to lectures and tutorials using Krathwohl’s revised taxonomy. Finally, we present the feedback, which we have received by the students over the years

    Cloud Computing and Big Data for Oil and Gas Industry Application in China

    Get PDF
    The oil and gas industry is a complex data-driven industry with compute-intensive, data-intensive and business-intensive features. Cloud computing and big data have a broad application prospect in the oil and gas industry. This research aims to highlight the cloud computing and big data issues and challenges from the informatization in oil and gas industry. In this paper, the distributed cloud storage architecture and its applications for seismic data of oil and gas industry are focused on first. Then,cloud desktop for oil and gas industry applications are also introduced in terms of efficiency, security and usability. Finally, big data architecture and security issues of oil and gas industry are analyzed. Cloud computing and big data architectures have advantages in many aspects, such as system scalability, reliability, and serviceability. This paper also provides a brief description for the future development of Cloud computing and big data in oil and gas industry. Cloud computing and big data can provide convenient information sharing and high quality service for oil and gas industry
    • …
    corecore