7,800 research outputs found
ARM Wrestling with Big Data: A Study of Commodity ARM64 Server for Big Data Workloads
ARM processors have dominated the mobile device market in the last decade due
to their favorable computing to energy ratio. In this age of Cloud data centers
and Big Data analytics, the focus is increasingly on power efficient
processing, rather than just high throughput computing. ARM's first commodity
server-grade processor is the recent AMD A1100-series processor, based on a
64-bit ARM Cortex A57 architecture. In this paper, we study the performance and
energy efficiency of a server based on this ARM64 CPU, relative to a comparable
server running an AMD Opteron 3300-series x64 CPU, for Big Data workloads.
Specifically, we study these for Intel's HiBench suite of web, query and
machine learning benchmarks on Apache Hadoop v2.7 in a pseudo-distributed
setup, for data sizes up to files, web pages and tuples. Our
results show that the ARM64 server's runtime performance is comparable to the
x64 server for integer-based workloads like Sort and Hive queries, and only
lags behind for floating-point intensive benchmarks like PageRank, when they do
not exploit data parallelism adequately. We also see that the ARM64 server
takes the energy, and has an Energy Delay Product (EDP) that
is lower than the x64 server. These results hold promise for ARM64
data centers hosting Big Data workloads to reduce their operational costs,
while opening up opportunities for further analysis.Comment: Accepted for publication in the Proceedings of the 24th IEEE
International Conference on High Performance Computing, Data, and Analytics
(HiPC), 201
The Effect of Network and Infrastructural Variables on SPDY's Performance
HTTP is a successful Internet technology on top of which a lot of the web
resides. However, limitations with its current specification, i.e. HTTP/1.1,
have encouraged some to look for the next generation of HTTP. In SPDY, Google
has come up with such a proposal that has growing community acceptance,
especially after being adopted by the IETF HTTPbis-WG as the basis for
HTTP/2.0. SPDY has the potential to greatly improve web experience with little
deployment overhead. However, we still lack an understanding of its true
potential in different environments. This paper seeks to resolve these issues,
offering a comprehensive evaluation of SPDY's performance using extensive
experiments. We identify the impact of network characteristics and website
infrastructure on SPDY's potential page loading benefits, finding that these
factors are decisive for SPDY and its optimal deployment strategy. Through
this, we feed into the wider debate regarding HTTP/2.0, exploring the key
aspects that impact the performance of this future protocol
Low latency via redundancy
Low latency is critical for interactive networked applications. But while we
know how to scale systems to increase capacity, reducing latency --- especially
the tail of the latency distribution --- can be much more difficult. In this
paper, we argue that the use of redundancy is an effective way to convert extra
capacity into reduced latency. By initiating redundant operations across
diverse resources and using the first result which completes, redundancy
improves a system's latency even under exceptional conditions. We study the
tradeoff with added system utilization, characterizing the situations in which
replicating all tasks reduces mean latency. We then demonstrate empirically
that replicating all operations can result in significant mean and tail latency
reduction in real-world systems including DNS queries, database servers, and
packet forwarding within networks
Cloud-based or On-device: An Empirical Study of Mobile Deep Inference
Modern mobile applications are benefiting significantly from the advancement
in deep learning, e.g., implementing real-time image recognition and
conversational system. Given a trained deep learning model, applications
usually need to perform a series of matrix operations based on the input data,
in order to infer possible output values. Because of computational complexity
and size constraints, these trained models are often hosted in the cloud. To
utilize these cloud-based models, mobile apps will have to send input data over
the network. While cloud-based deep learning can provide reasonable response
time for mobile apps, it restricts the use case scenarios, e.g. mobile apps
need to have network access. With mobile specific deep learning optimizations,
it is now possible to employ on-device inference. However, because mobile
hardware, such as GPU and memory size, can be very limited when compared to its
desktop counterpart, it is important to understand the feasibility of this new
on-device deep learning inference architecture. In this paper, we empirically
evaluate the inference performance of three Convolutional Neural Networks
(CNNs) using a benchmark Android application we developed. Our measurement and
analysis suggest that on-device inference can cost up to two orders of
magnitude greater response time and energy when compared to cloud-based
inference, and that loading model and computing probability are two performance
bottlenecks for on-device deep inferences.Comment: Accepted at The IEEE International Conference on Cloud Engineering
(IC2E) conference 201
SRPT Scheduling for Web Servers
This note briey summarizes some results from two papers: [4] and [23]. These papers pose the following question: Is it possible to reduce the expected response time of every request at a web server, simply by changing the order in which we schedule the requests? In [4] we approach this question analytically via an M/G/1 queue. In [23] we approach the same question via implementation involving an Apache web server running on Linux
- …