43,130 research outputs found
Characterizing and Subsetting Big Data Workloads
Big data benchmark suites must include a diversity of data and workloads to
be useful in fairly evaluating big data systems and architectures. However,
using truly comprehensive benchmarks poses great challenges for the
architecture community. First, we need to thoroughly understand the behaviors
of a variety of workloads. Second, our usual simulation-based research methods
become prohibitively expensive for big data. As big data is an emerging field,
more and more software stacks are being proposed to facilitate the development
of big data applications, which aggravates hese challenges. In this paper, we
first use Principle Component Analysis (PCA) to identify the most important
characteristics from 45 metrics to characterize big data workloads from
BigDataBench, a comprehensive big data benchmark suite. Second, we apply a
clustering technique to the principle components obtained from the PCA to
investigate the similarity among big data workloads, and we verify the
importance of including different software stacks for big data benchmarking.
Third, we select seven representative big data workloads by removing redundant
ones and release the BigDataBench simulation version, which is publicly
available from http://prof.ict.ac.cn/BigDataBench/simulatorversion/.Comment: 11 pages, 6 figures, 2014 IEEE International Symposium on Workload
Characterizatio
BigDataBench: a Big Data Benchmark Suite from Internet Services
As architecture, systems, and data management communities pay greater
attention to innovative big data systems and architectures, the pressure of
benchmarking and evaluating these systems rises. Considering the broad use of
big data systems, big data benchmarks must include diversity of data and
workloads. Most of the state-of-the-art big data benchmarking efforts target
evaluating specific types of applications or system software stacks, and hence
they are not qualified for serving the purposes mentioned above. This paper
presents our joint research efforts on this issue with several industrial
partners. Our big data benchmark suite BigDataBench not only covers broad
application scenarios, but also includes diverse and representative data sets.
BigDataBench is publicly available from http://prof.ict.ac.cn/BigDataBench .
Also, we comprehensively characterize 19 big data workloads included in
BigDataBench with varying data inputs. On a typical state-of-practice
processor, Intel Xeon E5645, we have the following observations: First, in
comparison with the traditional benchmarks: including PARSEC, HPCC, and
SPECCPU, big data applications have very low operation intensity; Second, the
volume of data input has non-negligible impact on micro-architecture
characteristics, which may impose challenges for simulation-based big data
architecture research; Last but not least, corroborating the observations in
CloudSuite and DCBench (which use smaller data inputs), we find that the
numbers of L1 instruction cache misses per 1000 instructions of the big data
applications are higher than in the traditional benchmarks; also, we find that
L3 caches are effective for the big data applications, corroborating the
observation in DCBench.Comment: 12 pages, 6 figures, The 20th IEEE International Symposium On High
Performance Computer Architecture (HPCA-2014), February 15-19, 2014, Orlando,
Florida, US
An Approach for the Empirical Validation of Software Complexity Measures
Software metrics are widely accepted tools to control and assure software quality. A large number of software metrics with a variety of content can be found in the literature; however most of them are not adopted in industry as they are seen as irrelevant to needs, as they are unsupported, and the major reason behind this is due to improper
empirical validation. This paper tries to identify possible root causes for the improper empirical validation of the software metrics. A practical model for the empirical validation of software metrics is proposed along with root causes. The model is validated by applying it to recently proposed and well known metrics
A Data-driven Approach Towards Human-robot Collaborative Problem Solving in a Shared Space
We are developing a system for human-robot communication that enables people
to communicate with robots in a natural way and is focused on solving problems
in a shared space. Our strategy for developing this system is fundamentally
data-driven: we use data from multiple input sources and train key components
with various machine learning techniques. We developed a web application that
is collecting data on how two humans communicate to accomplish a task, as well
as a mobile laboratory that is instrumented to collect data on how two humans
communicate to accomplish a task in a physically shared space. The data from
these systems will be used to train and fine-tune the second stage of our
system, in which the robot will be simulated through software. A physical robot
will be used in the final stage of our project. We describe these instruments,
a test-suite and performance metrics designed to evaluate and automate the data
gathering process as well as evaluate an initial data set.Comment: 2017 AAAI Fall Symposium on Natural Communication for Human-Robot
Collaboratio
A Parsing Scheme for Finding the Design Pattern and Reducing the Development Cost of Reusable Object Oriented Software
Because of the importance of object oriented methodologies, the research in
developing new measure for object oriented system development is getting
increased focus. The most of the metrics need to find the interactions between
the objects and modules for developing necessary metric and an influential
software measure that is attracting the software developers, designers and
researchers. In this paper a new interactions are defined for object oriented
system. Using these interactions, a parser is developed to analyze the existing
architecture of the software. Within the design model, it is necessary for
design classes to collaborate with one another. However, collaboration should
be kept to an acceptable minimum i.e. better designing practice will introduce
low coupling. If a design model is highly coupled, the system is difficult to
implement, to test and to maintain overtime. In case of enhancing software, we
need to introduce or remove module and in that case coupling is the most
important factor to be considered because unnecessary coupling may make the
system unstable and may cause reduction in the system's performance. So
coupling is thought to be a desirable goal in software construction, leading to
better values for external software qualities such as maintainability,
reusability and so on. To test this hypothesis, a good measure of class
coupling is needed. In this paper, based on the developed tool called Design
Analyzer we propose a methodology to reuse an existing system with the
objective of enhancing an existing Object oriented system keeping the coupling
as low as possible.Comment: 15 page
- …