65,628 research outputs found

    Statistical structures for internet-scale data management

    Get PDF
    Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability

    Measuring Scalability of Resource Management Systems

    Get PDF
    Scalability refers to the extent of configuration modifications over which a system continues to be economically deployable. Until now, scalability of resource management systems (RMSs) has been examined implicitly by studying different performance measures of the RMS designs for different parameters. However, a framework is yet to be developed for quantitatively evaluating scalability to unambiguously examine the trade-offs among the different RMS designs. In this paper, we present a methodology to study scalability of RMSs based on overhead cost estimation. First, we present a performance model for a managed distributed system (e.g., Grid computing system) that separates the manager and managee. Second, based on the performance model we present a metric used to quantify the scalability of a RMS. Third, simulations are used to apply the proposed scalability metric to selected RMSs from the literature. The results show that the proposed metric is useful in quantifying the scalabilities of the RMSs

    dCAMP: Distributed Common API for Measuring Performance

    Get PDF
    Although the nearing end of Moore’s Law has been predicted numerous times in the past, it will eventually come to pass. In forethought of this, many modern computing systems have become increasingly complex, distributed, and parallel. As software is developed on and for these complex systems, a common API is necessary for gathering vital performance related metrics while remaining transparent to the user, both in terms of system impact and ease of use. Several distributed performance monitoring and testing systems have been proposed and implemented by both research and commercial institutions. However, most of these systems do not meet several fundamental criterion for a truly useful distributed performance monitoring system: 1) variable data delivery models, 2) security, 3) scalability, 4) transparency, 5) completeness, 6) validity, and 7) portability. This work presents dCAMP: Distributed Common API for Measuring Performance, a distributed performance framework built on top of Mark Gabel and Michael Haungs’ work with CAMP. This work also presents an updated and extended set of criterion for evaluating distributed performance frameworks and uses these to evaluate dCAMP and several related works

    Evaluation of Optimization Strategies for Incremental Graph Queries

    Get PDF
    The last decade brought considerable improvements in distributed storage and query technologies, known as NoSQL systems. These systems provide quick evaluation of simple retrieval operations and are able to answer certain complex queries in a scalable way, albeit not instantly. Providing scalability and quick response times at the same time for querying large data sets is still a challenging task. Evaluating complex graph queries is particularly difficult, as it requires lots of join, antijoin and filtering operations. This paper presents optimization techniques used in relational database systems and applies them on graph queries. We evaluate various query plans on multiple datasets and discuss the effect of different optimization techniques

    On a Catalogue of Metrics for Evaluating Commercial Cloud Services

    Full text link
    Given the continually increasing amount of commercial Cloud services in the market, evaluation of different services plays a significant role in cost-benefit analysis or decision making for choosing Cloud Computing. In particular, employing suitable metrics is essential in evaluation implementations. However, to the best of our knowledge, there is not any systematic discussion about metrics for evaluating Cloud services. By using the method of Systematic Literature Review (SLR), we have collected the de facto metrics adopted in the existing Cloud services evaluation work. The collected metrics were arranged following different Cloud service features to be evaluated, which essentially constructed an evaluation metrics catalogue, as shown in this paper. This metrics catalogue can be used to facilitate the future practice and research in the area of Cloud services evaluation. Moreover, considering metrics selection is a prerequisite of benchmark selection in evaluation implementations, this work also supplements the existing research in benchmarking the commercial Cloud services.Comment: 10 pages, Proceedings of the 13th ACM/IEEE International Conference on Grid Computing (Grid 2012), pp. 164-173, Beijing, China, September 20-23, 201

    Performance of Network and Service Monitoring Frameworks

    Get PDF
    The efficiency and the performance of anagement systems is becoming a hot research topic within the networks and services management community. This concern is due to the new challenges of large scale managed systems, where the management plane is integrated within the functional plane and where management activities have to carry accurate and up-to-date information. We defined a set of primary and secondary metrics to measure the performance of a management approach. Secondary metrics are derived from the primary ones and quantifies mainly the efficiency, the scalability and the impact of management activities. To validate our proposals, we have designed and developed a benchmarking platform dedicated to the measurement of the performance of a JMX manager-agent based management system. The second part of our work deals with the collection of measurement data sets from our JMX benchmarking platform. We mainly studied the effect of both load and the number of agents on the scalability, the impact of management activities on the user perceived performance of a managed server and the delays of JMX operations when carrying variables values. Our findings show that most of these delays follow a Weibull statistical distribution. We used this statistical model to study the behavior of a monitoring algorithm proposed in the literature, under heavy tail delays distribution. In this case, the view of the managed system on the manager side becomes noisy and out of date
    corecore