1,687 research outputs found
ARM Wrestling with Big Data: A Study of Commodity ARM64 Server for Big Data Workloads
ARM processors have dominated the mobile device market in the last decade due
to their favorable computing to energy ratio. In this age of Cloud data centers
and Big Data analytics, the focus is increasingly on power efficient
processing, rather than just high throughput computing. ARM's first commodity
server-grade processor is the recent AMD A1100-series processor, based on a
64-bit ARM Cortex A57 architecture. In this paper, we study the performance and
energy efficiency of a server based on this ARM64 CPU, relative to a comparable
server running an AMD Opteron 3300-series x64 CPU, for Big Data workloads.
Specifically, we study these for Intel's HiBench suite of web, query and
machine learning benchmarks on Apache Hadoop v2.7 in a pseudo-distributed
setup, for data sizes up to files, web pages and tuples. Our
results show that the ARM64 server's runtime performance is comparable to the
x64 server for integer-based workloads like Sort and Hive queries, and only
lags behind for floating-point intensive benchmarks like PageRank, when they do
not exploit data parallelism adequately. We also see that the ARM64 server
takes the energy, and has an Energy Delay Product (EDP) that
is lower than the x64 server. These results hold promise for ARM64
data centers hosting Big Data workloads to reduce their operational costs,
while opening up opportunities for further analysis.Comment: Accepted for publication in the Proceedings of the 24th IEEE
International Conference on High Performance Computing, Data, and Analytics
(HiPC), 201
Portable and Accurate Collection of Calling-Context-Sensitive Bytecode Metrics for the Java Virtual Machine
Calling-context profiles and dynamic metrics at the bytecode level are important for profiling, workload characterization, program comprehension, and reverse engineering. Prevailing tools for collecting calling-context profiles or dynamic bytecode metrics often provide only incomplete information or suffer from limited compatibility with standard JVMs. However, completeness and accuracy of the profiles is essential for tasks such as workload characterization, and compatibility with standard JVMs is important to ensure that complex workloads can be executed. In this paper, we present the design and implementation of JP2, a new tool that profiles both the inter- and intra-procedural control flow of workloads on standard JVMs. JP2 produces calling-context profiles preserving callsite information, as well as execution statistics at the level of individual basic blocks of code. JP2 is complemented with scripts that compute various dynamic bytecode metrics from the profiles. As a case-study and tutorial on the use of JP2, we use it for cross-profiling for an embedded Java processor
Challenges in real-time virtualization and predictable cloud computing
Cloud computing and virtualization technology have revolutionized general-purpose computing applications in the past decade. The cloud paradigm offers advantages through reduction of operation costs, server consolidation, flexible system configuration and elastic resource provisioning. However, despite the success of cloud computing for general-purpose computing, existing cloud computing and virtualization technology face tremendous challenges in supporting emerging soft real-time applications such as online video streaming, cloud-based gaming, and telecommunication management. These applications demand real-time performance in open, shared and virtualized computing environments. This paper identifies the technical challenges in supporting real-time applications in the cloud, surveys recent advancement in real-time virtualization and cloud computing technology, and offers research directions to enable cloud-based real-time applications in the future
A Survey of Phase Classification Techniques for Characterizing Variable Application Behavior
Adaptable computing is an increasingly important paradigm that specializes
system resources to variable application requirements, environmental
conditions, or user requirements. Adapting computing resources to variable
application requirements (or application phases) is otherwise known as
phase-based optimization. Phase-based optimization takes advantage of
application phases, or execution intervals of an application, that behave
similarly, to enable effective and beneficial adaptability. In order for
phase-based optimization to be effective, the phases must first be classified
to determine when application phases begin and end, and ensure that system
resources are accurately specialized. In this paper, we present a survey of
phase classification techniques that have been proposed to exploit the
advantages of adaptable computing through phase-based optimization. We focus on
recent techniques and classify these techniques with respect to several factors
in order to highlight their similarities and differences. We divide the
techniques by their major defining characteristics---online/offline and
serial/parallel. In addition, we discuss other characteristics such as
prediction and detection techniques, the characteristics used for prediction,
interval type, etc. We also identify gaps in the state-of-the-art and discuss
future research directions to enable and fully exploit the benefits of
adaptable computing.Comment: To appear in IEEE Transactions on Parallel and Distributed Systems
(TPDS
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
- …