531 research outputs found

    On the Intrinsic Locality Properties of Web Reference Streams

    Full text link
    There has been considerable work done in the study of Web reference streams: sequences of requests for Web objects. In particular, many studies have looked at the locality properties of such streams, because of the impact of locality on the design and performance of caching and prefetching systems. However, a general framework for understanding why reference streams exhibit given locality properties has not yet emerged. In this work we take a first step in this direction, based on viewing the Web as a set of reference streams that are transformed by Web components (clients, servers, and intermediaries). We propose a graph-based framework for describing this collection of streams and components. We identify three basic stream transformations that occur at nodes of the graph: aggregation, disaggregation and filtering, and we show how these transformations can be used to abstract the effects of different Web components on their associated reference streams. This view allows a structured approach to the analysis of why reference streams show given properties at different points in the Web. Applying this approach to the study of locality requires good metrics for locality. These metrics must meet three criteria: 1) they must accurately capture temporal locality; 2) they must be independent of trace artifacts such as trace length; and 3) they must not involve manual procedures or model-based assumptions. We describe two metrics meeting these criteria that each capture a different kind of temporal locality in reference streams. The popularity component of temporal locality is captured by entropy, while the correlation component is captured by interreference coefficient of variation. We argue that these metrics are more natural and more useful than previously proposed metrics for temporal locality. We use this framework to analyze a diverse set of Web reference traces. We find that this framework can shed light on how and why locality properties vary across different locations in the Web topology. For example, we find that filtering and aggregation have opposing effects on the popularity component of the temporal locality, which helps to explain why multilevel caching can be effective in the Web. Furthermore, we find that all transformations tend to diminish the correlation component of temporal locality, which has implications for the utility of different cache replacement policies at different points in the Web.National Science Foundation (ANI-9986397, ANI-0095988); CNPq-Brazi

    Database server workload characterization in an e-commerce environment

    Get PDF
    A typical E-commerce system that is deployed on the Internet has multiple layers that include Web users, Web servers, application servers, and a database server. As the system use and user request frequency increase, Web/application servers can be scaled up by replication. A load balancing proxy can be used to route user requests to individual machines that perform the same functionality. To address the increasing workload while avoiding replicating the database server, various dynamic caching policies have been proposed to reduce the database workload in E-commerce systems. However, the nature of the changes seen by the database server as a result of dynamic caching remains unknown. A good understanding of this change is fundamental for tuning a database server to get better performance. In this study, the TPC-W (a transactional Web E-commerce benchmark) workloads on a database server are characterized under two different dynamic caching mechanisms, which are generalized and implemented as query-result cache and table cache. The characterization focuses on response time, CPU computation, buffer pool references, disk I/O references, and workload classification. This thesis combines a variety of analysis techniques: simulation, real time measurement and data mining. The experimental results in this thesis reveal some interesting effects that the dynamic caching has on the database server workload characteristics. The main observations include: (a) dynamic cache can considerably reduce the CPU usage of the database server and the number of database page references when it is heavily loaded; (b) dynamic cache can also reduce the database reference locality, but to a smaller degree than that reported in file servers. The data classification results in this thesis show that with dynamic cache, the database server sees TPC-W profiles more like on-line transaction processing workloads

    GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams

    Full text link
    We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques

    Optimization inWeb Caching: Cache Management, Capacity Planning, and Content Naming

    Full text link
    Caching is fundamental to performance in distributed information retrieval systems such as the World Wide Web. This thesis introduces novel techniques for optimizing performance and cost-effectiveness in Web cache hierarchies. When requests are served by nearby caches rather than distant servers, server loads and network traffic decrease and transactions are faster. Cache system design and management, however, face extraordinary challenges in loosely-organized environments like the Web, where the many components involved in content creation, transport, and consumption are owned and administered by different entities. Such environments call for decentralized algorithms in which stakeholders act on local information and private preferences. In this thesis I consider problems of optimally designing new Web cache hierarchies and optimizing existing ones. The methods I introduce span the Web from point of content creation to point of consumption: I quantify the impact of content-naming practices on cache performance; present techniques for variable-quality-of-service cache management; describe how a decentralized algorithm can compute economically-optimal cache sizes in a branching two-level cache hierarchy; and introduce a new protocol extension that eliminates redundant data transfers and allows “dynamic” content to be cached consistently. To evaluate several of my new methods, I conducted trace-driven simulations on an unprecedented scale. This in turn required novel workload measurement methods and efficient new characterization and simulation techniques. The performance benefits of my proposed protocol extension are evaluated using two extraordinarily large and detailed workload traces collected in a traditional corporate network environment and an unconventional thin-client system. My empirical research follows a simple but powerful paradigm: measure on a large scale an important production environment’s exogenous workload; identify performance bounds inherent in the workload, independent of the system currently serving it; identify gaps between actual and potential performance in the environment under study; and finally devise ways to close these gaps through component modifications or through improved inter-component integration. This approach may be applicable to a wide range of Web services as they mature.Ph.D.Computer Science and EngineeringUniversity of Michiganhttp://deepblue.lib.umich.edu/bitstream/2027.42/90029/1/kelly-optimization_web_caching.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/90029/2/kelly-optimization_web_caching.ps.bz

    Internet performance modeling: the state of the art at the turn of the century

    Get PDF
    Seemingly overnight, the Internet has gone from an academic experiment to a worldwide information matrix. Along the way, computer scientists have come to realize that understanding the performance of the Internet is a remarkably challenging and subtle problem. This challenge is all the more important because of the increasingly significant role the Internet has come to play in society. To take stock of the field of Internet performance modeling, the authors organized a workshop at Schloß Dagstuhl. This paper summarizes the results of discussions, both plenary and in small groups, that took place during the four-day workshop. It identifies successes, points to areas where more work is needed, and poses “Grand Challenges” for the performance evaluation community with respect to the Internet

    SimpleSSD: Modeling Solid State Drives for Holistic System Simulation

    Full text link
    Existing solid state drive (SSD) simulators unfortunately lack hardware and/or software architecture models. Consequently, they are far from capturing the critical features of contemporary SSD devices. More importantly, while the performance of modern systems that adopt SSDs can vary based on their numerous internal design parameters and storage-level configurations, a full system simulation with traditional SSD models often requires unreasonably long runtimes and excessive computational resources. In this work, we propose SimpleSSD, a highfidelity simulator that models all detailed characteristics of hardware and software, while simplifying the nondescript features of storage internals. In contrast to existing SSD simulators, SimpleSSD can easily be integrated into publicly-available full system simulators. In addition, it can accommodate a complete storage stack and evaluate the performance of SSDs along with diverse memory technologies and microarchitectures. Thus, it facilitates simulations that explore the full design space at different levels of system abstraction.Comment: This paper has been accepted at IEEE Computer Architecture Letters (CAL
    • …
    corecore