23,418 research outputs found
BigDataBench: a Big Data Benchmark Suite from Internet Services
As architecture, systems, and data management communities pay greater
attention to innovative big data systems and architectures, the pressure of
benchmarking and evaluating these systems rises. Considering the broad use of
big data systems, big data benchmarks must include diversity of data and
workloads. Most of the state-of-the-art big data benchmarking efforts target
evaluating specific types of applications or system software stacks, and hence
they are not qualified for serving the purposes mentioned above. This paper
presents our joint research efforts on this issue with several industrial
partners. Our big data benchmark suite BigDataBench not only covers broad
application scenarios, but also includes diverse and representative data sets.
BigDataBench is publicly available from http://prof.ict.ac.cn/BigDataBench .
Also, we comprehensively characterize 19 big data workloads included in
BigDataBench with varying data inputs. On a typical state-of-practice
processor, Intel Xeon E5645, we have the following observations: First, in
comparison with the traditional benchmarks: including PARSEC, HPCC, and
SPECCPU, big data applications have very low operation intensity; Second, the
volume of data input has non-negligible impact on micro-architecture
characteristics, which may impose challenges for simulation-based big data
architecture research; Last but not least, corroborating the observations in
CloudSuite and DCBench (which use smaller data inputs), we find that the
numbers of L1 instruction cache misses per 1000 instructions of the big data
applications are higher than in the traditional benchmarks; also, we find that
L3 caches are effective for the big data applications, corroborating the
observation in DCBench.Comment: 12 pages, 6 figures, The 20th IEEE International Symposium On High
Performance Computer Architecture (HPCA-2014), February 15-19, 2014, Orlando,
Florida, US
A Benchmark for Image Retrieval using Distributed Systems over the Internet: BIRDS-I
The performance of CBIR algorithms is usually measured on an isolated
workstation. In a real-world environment the algorithms would only constitute a
minor component among the many interacting components. The Internet
dramati-cally changes many of the usual assumptions about measuring CBIR
performance. Any CBIR benchmark should be designed from a networked systems
standpoint. These benchmarks typically introduce communication overhead because
the real systems they model are distributed applications. We present our
implementation of a client/server benchmark called BIRDS-I to measure image
retrieval performance over the Internet. It has been designed with the trend
toward the use of small personalized wireless systems in mind. Web-based CBIR
implies the use of heteroge-neous image sets, imposing certain constraints on
how the images are organized and the type of performance metrics applicable.
BIRDS-I only requires controlled human intervention for the compilation of the
image collection and none for the generation of ground truth in the measurement
of retrieval accuracy. Benchmark image collections need to be evolved
incrementally toward the storage of millions of images and that scaleup can
only be achieved through the use of computer-aided compilation. Finally, our
scoring metric introduces a tightly optimized image-ranking window.Comment: 24 pages, To appear in the Proc. SPIE Internet Imaging Conference
200
Archiving the Relaxed Consistency Web
The historical, cultural, and intellectual importance of archiving the web
has been widely recognized. Today, all countries with high Internet penetration
rate have established high-profile archiving initiatives to crawl and archive
the fast-disappearing web content for long-term use. As web technologies
evolve, established web archiving techniques face challenges. This paper
focuses on the potential impact of the relaxed consistency web design on
crawler driven web archiving. Relaxed consistent websites may disseminate,
albeit ephemerally, inaccurate and even contradictory information. If captured
and preserved in the web archives as historical records, such information will
degrade the overall archival quality. To assess the extent of such quality
degradation, we build a simplified feed-following application and simulate its
operation with synthetic workloads. The results indicate that a non-trivial
portion of a relaxed consistency web archive may contain observable
inconsistency, and the inconsistency window may extend significantly longer
than that observed at the data store. We discuss the nature of such quality
degradation and propose a few possible remedies.Comment: 10 pages, 6 figures, CIKM 201
Addressing the Challenges in Federating Edge Resources
This book chapter considers how Edge deployments can be brought to bear in a
global context by federating them across multiple geographic regions to create
a global Edge-based fabric that decentralizes data center computation. This is
currently impractical, not only because of technical challenges, but is also
shrouded by social, legal and geopolitical issues. In this chapter, we discuss
two key challenges - networking and management in federating Edge deployments.
Additionally, we consider resource and modeling challenges that will need to be
addressed for a federated Edge.Comment: Book Chapter accepted to the Fog and Edge Computing: Principles and
Paradigms; Editors Buyya, Sriram
Early Observations on Performance of Google Compute Engine for Scientific Computing
Although Cloud computing emerged for business applications in industry,
public Cloud services have been widely accepted and encouraged for scientific
computing in academia. The recently available Google Compute Engine (GCE) is
claimed to support high-performance and computationally intensive tasks, while
little evaluation studies can be found to reveal GCE's scientific capabilities.
Considering that fundamental performance benchmarking is the strategy of
early-stage evaluation of new Cloud services, we followed the Cloud Evaluation
Experiment Methodology (CEEM) to benchmark GCE and also compare it with Amazon
EC2, to help understand the elementary capability of GCE for dealing with
scientific problems. The experimental results and analyses show both potential
advantages of, and possible threats to applying GCE to scientific computing.
For example, compared to Amazon's EC2 service, GCE may better suit applications
that require frequent disk operations, while it may not be ready yet for single
VM-based parallel computing. Following the same evaluation methodology,
different evaluators can replicate and/or supplement this fundamental
evaluation of GCE. Based on the fundamental evaluation results, suitable GCE
environments can be further established for case studies of solving real
science problems.Comment: Proceedings of the 5th International Conference on Cloud Computing
Technologies and Science (CloudCom 2013), pp. 1-8, Bristol, UK, December 2-5,
201
Performance of Network and Service Monitoring Frameworks
The efficiency and the performance of anagement systems is becoming a hot
research topic within the networks and services management community. This
concern is due to the new challenges of large scale managed systems, where the
management plane is integrated within the functional plane and where management
activities have to carry accurate and up-to-date information. We defined a set
of primary and secondary metrics to measure the performance of a management
approach. Secondary metrics are derived from the primary ones and quantifies
mainly the efficiency, the scalability and the impact of management activities.
To validate our proposals, we have designed and developed a benchmarking
platform dedicated to the measurement of the performance of a JMX manager-agent
based management system. The second part of our work deals with the collection
of measurement data sets from our JMX benchmarking platform. We mainly studied
the effect of both load and the number of agents on the scalability, the impact
of management activities on the user perceived performance of a managed server
and the delays of JMX operations when carrying variables values. Our findings
show that most of these delays follow a Weibull statistical distribution. We
used this statistical model to study the behavior of a monitoring algorithm
proposed in the literature, under heavy tail delays distribution. In this case,
the view of the managed system on the manager side becomes noisy and out of
date
- …