Search CORE

13 research outputs found

Characterizing and Subsetting Big Data Workloads

Author: Han Rui
Jia Zhen
Li Jingwei
Luo Chunjie
McKee Sally A.
Wang Lei
Yang Qiang
Zhan Jianfeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks poses great challenges for the architecture community. First, we need to thoroughly understand the behaviors of a variety of workloads. Second, our usual simulation-based research methods become prohibitively expensive for big data. As big data is an emerging field, more and more software stacks are being proposed to facilitate the development of big data applications, which aggravates hese challenges. In this paper, we first use Principle Component Analysis (PCA) to identify the most important characteristics from 45 metrics to characterize big data workloads from BigDataBench, a comprehensive big data benchmark suite. Second, we apply a clustering technique to the principle components obtained from the PCA to investigate the similarity among big data workloads, and we verify the importance of including different software stacks for big data benchmarking. Third, we select seven representative big data workloads by removing redundant ones and release the BigDataBench simulation version, which is publicly available from http://prof.ict.ac.cn/BigDataBench/simulatorversion/.Comment: 11 pages, 6 figures, 2014 IEEE International Symposium on Workload Characterizatio

arXiv.org e-Print Archive

Crossref

Chalmers Research

ShenZhen transportation system (SZTS): a novel big data benchmark suite

Author: Bei Zhengdong
Eeckhout Lieven
Xiong Wen
Xu Chengzhong
Yu Zhibin
Zhang Fan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Data analytics is at the core of the supply chain for both products and services in modern economies and societies. Big data workloads, however, are placing unprecedented demands on computing technologies, calling for a deep understanding and characterization of these emerging workloads. In this paper, we propose ShenZhen Transportation System (SZTS), a novel big data Hadoop benchmark suite comprised of real-life transportation analysis applications with real-life input data sets from Shenzhen in China. SZTS uniquely focuses on a specific and real-life application domain whereas other existing Hadoop benchmark suites, such as HiBench and CloudRank-D, consist of generic algorithms with synthetic inputs. We perform a cross-layer workload characterization at the microarchitecture level, the operating system (OS) level, and the job level, revealing unique characteristics of SZTS compared to existing Hadoop benchmarks as well as general-purpose multi-core PARSEC benchmarks. We also study the sensitivity of workload behavior with respect to input data size, and we propose a methodology for identifying representative input data sets

Ghent University Academic Bibliography

PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors

Author: Christian Bienia
Kai Li
Sanjeev Kumar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

The PARSEC benchmark suite was recently released and has been adopted by a significant number of users within a short amount of time. This new collection of workloads is not yet fully under-stood by researchers. In this study we compare the SPLASH-2 and PARSEC benchmark suites with each other to gain insights into differences and similarities between the two program collections. We use standard statistical methods and machine learning to ana-lyze the suites for redundancy and overlap on Chip-Multiprocessors (CMPs). Our analysis shows that PARSEC workloads are funda-mentally different from SPLASH-2 benchmarks. The observed dif-ferences can be explained with two technology trends, the prolifer-ation of CMPs and the accelerating growth of world data

CiteSeerX

Crossref

MiDataSets: Creating the Conditions for a More Realistic Evaluation of Iterative Optimization

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Crossref

Enabling run-time memory data transfer optimizations at the system level with automated extraction of embedded software metadata information

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Pruning hardware evaluation space via correlation-driven application similarity analysis

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Crossref

ISA-independent workload characterization and its implications for specialized architectures

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recommended from our members

Measuring program similarity for efficient benchmarking and performance analysis of computer systems

Author: Phansalkar Aashish S.
Publication venue
Publication date: 01/05/2007
Field of study

textComputer benchmarking involves running a set of benchmark programs to measure performance of a computer system. Modern benchmarks are developed from real applications. Applications are becoming complex and hence modern benchmarks run for a very long time. These benchmarks are also used for performance evaluation in the early design phase of microprocessors. Due to the size of benchmarks and increase in complexity of microprocessor design, the effort required for performance evaluation has increased significantly. This dissertation proposes methodologies to reduce the effort of benchmarking and performance evaluation of computer systems. Identifying a set of programs that can be used in the process of benchmarking can be very challenging. A solution to this problem can start by identifying similarity between programs to capture the diversity in their behavior before they can be considered for benchmarking. The aim of this methodology is to identify redundancy in the set of benchmarks and find a subset of representative benchmarks with the least possible loss of information. This dissertation proposes the use of program characteristics which capture the performance behavior of programs and identifies representative benchmarks applicable over a wide range of system configurations. The use of benchmark subsetting has not been restricted to academic research. Recently, the SPEC CPU subcommittee used the information derived from measuring similarity based on program behavior characteristics between different benchmark candidates as one of the criteria for selecting the SPEC CPU2006 benchmarks. The information of similarity between programs can also be used to predict performance of an application when it is difficult to port the application on different platforms. This is a common problem when a customer wants to buy the best computer system for his application. Performance of a customer's application on a particular system can be predicted using the performance scores of the standard benchmarks on that system and the similarity information between the application and the benchmarks. Similarity between programs is quantified by the distance between them in the space of the measured characteristics, and is appropriately used to predict performance of a new application using the performance scores of its neighbors in the workload space.Electrical and Computer Engineerin

Texas ScholarWorks

Descobrindo o comportamento de fases através do agrupamento de características independentes de microarquitetura variantes no tempo

Author: Soares Rafael Mendonça, 1992-
Publication venue: [s.n.]
Publication date: 03/12/2020
Field of study

Orientador: Rodolfo Jardim de AzevedoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A análise de fases provou-se uma técnica eficiente para diminuir o tempo necessário para executar simulações detalhadas de microarquitetura. O objetivo deste estudo é solucionar duas dificuldades do estado da arte: (i) a maioria das abordagens feitas na análise de fases adota uma estratégia de granularidade fina, que em alguns casos pode ser interferida por ruídos temporários e não levar em conta um contexto mais amplo; e (ii) a interpretação da assinatura de cada fase de programa é uma tarefa difícil, dado que muitas vezes são empregadas assinaturas de alta dimensão. Para a problemática (i) adotamos a análise de fases de programas em dois níveis, cada qual com uma granularidade diferente (nível 1 -- método de agrupamento de subsequências de séries temporais multivariadas; nível 2 --

k

-means). No entanto, concluímos que essa abordagem alcançou uma precisão comparável aos trabalhos anteriores. Chegamos então ao estado da arte de forma alternativa, mas com a vantagem de trazer subsídios para uma potencial solução para a problemática (ii), pois com o método empregado, as fases passaram a ter uma assinatura (MRF) muito mais interpretável, além de alinhada ao comportamento dos programas. Demonstramos a eficácia dessa interpretação usando uma medida de centralidade para identificar as principais características de uma fase de programa, contribuindo assim para o uso dessas assinaturas (MRF) de fases em estudos posterioresAbstract: Phase analysis has been shown to be an efficient technique to decrease the time needed to execute detailed micro-architectural simulations. Our study aimed to overcome two limitations of current methods that can be defined as follows: (i) most approaches adopt a fine-grained strategy, which in some cases can be interfered with temporary noises and do not account for a broader context; and (ii) interpreting the resulting program phases is often difficult since it is hard to draw meaningful conclusions from high-dimensional phase signatures. Regarding (i), we adopted a two-level phase analysis, each with different granularity (level 1 -- method of subsequence clustering of multivariate time series; level 2 --

k

-means). However, we found that, on average, this sampling approach achieved comparable accuracy in phase classification to prior work. Thus, we achieved state-of-the-art precision with a potential solution to the problem (ii), since with the method employed, the phases started to have a much more interpretable signature (MRF), in addition to be closely aligned with the behavior of a program. We demonstrated the effectiveness of such interpretation using a centrality measure to identify the most important characteristics within a program phaseMestradoCiência da ComputaçãoMestre em Ciência da Computação131024/2017-5CNP

Repositorio da Producao Cientifica e Intelectual da Unicamp