43,130 research outputs found

    Characterizing and Subsetting Big Data Workloads

    Full text link
    Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks poses great challenges for the architecture community. First, we need to thoroughly understand the behaviors of a variety of workloads. Second, our usual simulation-based research methods become prohibitively expensive for big data. As big data is an emerging field, more and more software stacks are being proposed to facilitate the development of big data applications, which aggravates hese challenges. In this paper, we first use Principle Component Analysis (PCA) to identify the most important characteristics from 45 metrics to characterize big data workloads from BigDataBench, a comprehensive big data benchmark suite. Second, we apply a clustering technique to the principle components obtained from the PCA to investigate the similarity among big data workloads, and we verify the importance of including different software stacks for big data benchmarking. Third, we select seven representative big data workloads by removing redundant ones and release the BigDataBench simulation version, which is publicly available from http://prof.ict.ac.cn/BigDataBench/simulatorversion/.Comment: 11 pages, 6 figures, 2014 IEEE International Symposium on Workload Characterizatio

    BigDataBench: a Big Data Benchmark Suite from Internet Services

    Full text link
    As architecture, systems, and data management communities pay greater attention to innovative big data systems and architectures, the pressure of benchmarking and evaluating these systems rises. Considering the broad use of big data systems, big data benchmarks must include diversity of data and workloads. Most of the state-of-the-art big data benchmarking efforts target evaluating specific types of applications or system software stacks, and hence they are not qualified for serving the purposes mentioned above. This paper presents our joint research efforts on this issue with several industrial partners. Our big data benchmark suite BigDataBench not only covers broad application scenarios, but also includes diverse and representative data sets. BigDataBench is publicly available from http://prof.ict.ac.cn/BigDataBench . Also, we comprehensively characterize 19 big data workloads included in BigDataBench with varying data inputs. On a typical state-of-practice processor, Intel Xeon E5645, we have the following observations: First, in comparison with the traditional benchmarks: including PARSEC, HPCC, and SPECCPU, big data applications have very low operation intensity; Second, the volume of data input has non-negligible impact on micro-architecture characteristics, which may impose challenges for simulation-based big data architecture research; Last but not least, corroborating the observations in CloudSuite and DCBench (which use smaller data inputs), we find that the numbers of L1 instruction cache misses per 1000 instructions of the big data applications are higher than in the traditional benchmarks; also, we find that L3 caches are effective for the big data applications, corroborating the observation in DCBench.Comment: 12 pages, 6 figures, The 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), February 15-19, 2014, Orlando, Florida, US

    An Approach for the Empirical Validation of Software Complexity Measures

    Get PDF
    Software metrics are widely accepted tools to control and assure software quality. A large number of software metrics with a variety of content can be found in the literature; however most of them are not adopted in industry as they are seen as irrelevant to needs, as they are unsupported, and the major reason behind this is due to improper empirical validation. This paper tries to identify possible root causes for the improper empirical validation of the software metrics. A practical model for the empirical validation of software metrics is proposed along with root causes. The model is validated by applying it to recently proposed and well known metrics

    A Data-driven Approach Towards Human-robot Collaborative Problem Solving in a Shared Space

    Full text link
    We are developing a system for human-robot communication that enables people to communicate with robots in a natural way and is focused on solving problems in a shared space. Our strategy for developing this system is fundamentally data-driven: we use data from multiple input sources and train key components with various machine learning techniques. We developed a web application that is collecting data on how two humans communicate to accomplish a task, as well as a mobile laboratory that is instrumented to collect data on how two humans communicate to accomplish a task in a physically shared space. The data from these systems will be used to train and fine-tune the second stage of our system, in which the robot will be simulated through software. A physical robot will be used in the final stage of our project. We describe these instruments, a test-suite and performance metrics designed to evaluate and automate the data gathering process as well as evaluate an initial data set.Comment: 2017 AAAI Fall Symposium on Natural Communication for Human-Robot Collaboratio

    A Parsing Scheme for Finding the Design Pattern and Reducing the Development Cost of Reusable Object Oriented Software

    Full text link
    Because of the importance of object oriented methodologies, the research in developing new measure for object oriented system development is getting increased focus. The most of the metrics need to find the interactions between the objects and modules for developing necessary metric and an influential software measure that is attracting the software developers, designers and researchers. In this paper a new interactions are defined for object oriented system. Using these interactions, a parser is developed to analyze the existing architecture of the software. Within the design model, it is necessary for design classes to collaborate with one another. However, collaboration should be kept to an acceptable minimum i.e. better designing practice will introduce low coupling. If a design model is highly coupled, the system is difficult to implement, to test and to maintain overtime. In case of enhancing software, we need to introduce or remove module and in that case coupling is the most important factor to be considered because unnecessary coupling may make the system unstable and may cause reduction in the system's performance. So coupling is thought to be a desirable goal in software construction, leading to better values for external software qualities such as maintainability, reusability and so on. To test this hypothesis, a good measure of class coupling is needed. In this paper, based on the developed tool called Design Analyzer we propose a methodology to reuse an existing system with the objective of enhancing an existing Object oriented system keeping the coupling as low as possible.Comment: 15 page
    • …
    corecore