3,151 research outputs found

    On the Intrinsic Locality Properties of Web Reference Streams

    Full text link
    There has been considerable work done in the study of Web reference streams: sequences of requests for Web objects. In particular, many studies have looked at the locality properties of such streams, because of the impact of locality on the design and performance of caching and prefetching systems. However, a general framework for understanding why reference streams exhibit given locality properties has not yet emerged. In this work we take a first step in this direction, based on viewing the Web as a set of reference streams that are transformed by Web components (clients, servers, and intermediaries). We propose a graph-based framework for describing this collection of streams and components. We identify three basic stream transformations that occur at nodes of the graph: aggregation, disaggregation and filtering, and we show how these transformations can be used to abstract the effects of different Web components on their associated reference streams. This view allows a structured approach to the analysis of why reference streams show given properties at different points in the Web. Applying this approach to the study of locality requires good metrics for locality. These metrics must meet three criteria: 1) they must accurately capture temporal locality; 2) they must be independent of trace artifacts such as trace length; and 3) they must not involve manual procedures or model-based assumptions. We describe two metrics meeting these criteria that each capture a different kind of temporal locality in reference streams. The popularity component of temporal locality is captured by entropy, while the correlation component is captured by interreference coefficient of variation. We argue that these metrics are more natural and more useful than previously proposed metrics for temporal locality. We use this framework to analyze a diverse set of Web reference traces. We find that this framework can shed light on how and why locality properties vary across different locations in the Web topology. For example, we find that filtering and aggregation have opposing effects on the popularity component of the temporal locality, which helps to explain why multilevel caching can be effective in the Web. Furthermore, we find that all transformations tend to diminish the correlation component of temporal locality, which has implications for the utility of different cache replacement policies at different points in the Web.National Science Foundation (ANI-9986397, ANI-0095988); CNPq-Brazi

    Scalable cooperative caching algorithm based on bloom filters

    Get PDF
    This thesis presents the design, implementation and evaluation of a novel cooperative caching algorithm based on the bloom filter data structure. The new algorithm uses a decentralized approach to resolve the problems that prevent the existing solutions from being scalable. The problems consist of an overloaded manager, a communication overhead among clients, and a memory overhead on the global cache. The new solution reduces the manager load and the communication overhead by distributing the global cache information among cooperating clients. Thus, the manager no longer maintains the global cache. Furthermore, the memory overhead is decreased due to a bloom filter data structure. The bloom filter saves memory space in the global cache and makes the new algorithm scalable. The correctness of the research hypothesis is verified by running experiments on the caching algorithms. The experiment results demonstrate that the new caching algorithm maintains a low block access time as existing algorithms. In addition, the new algorithm decreases the manager load by the factor of nine. Moreover, the communication overhead is reduced by nearly a factor of six as a result of distributing the global cache to clients. Finally, the results show a significant reduction in the memory overhead which also contributes to the scalability of the new algorithm

    Optimal Prediction for Prefetching in the Worst Case

    Get PDF
    This is the published version. Copyright © 1998 Society for Industrial and Applied MathematicsResponse time delays caused by I/O are a major problem in many systems and database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, where the predictions must be accurate as well as made in a computationally efficient way. Unlike other online problems, prefetching cannot admit a competitive analysis, since the optimal offline prefetcher incurs no cost when it knows the future page requests. Previous analytical work on prefetching [. Vitter Krishnan 1991.] [J. Assoc. Comput. Mach., 143 (1996), pp. 771--793] consisted of modeling the user as a probabilistic Markov source. In this paper, we look at the much stronger form of worst-case analysis and derive a randomized algorithm for pure prefetching. We compare our algorithm for every page request sequence with the important class of finite state prefetchers, making no assumptions as to how the sequence of page requests is generated. We prove analytically that the fault rate of our online prefetching algorithm converges almost surely for every page request sequence to the fault rate of the optimal finite state prefetcher for the sequence. This analysis model can be looked upon as a generalization of the competitive framework, in that it compares an online algorithm in a worst-case manner over all sequences with a powerful yet nonclairvoyant opponent. We simultaneously achieve the computational goal of implementing our prefetcher in optimal constant expected time per prefetched page using the optimal dynamic discrete random variate generator of [. Matias Matias, Vitter, and Ni [Proc. 4th Annual SIAM/ACM Symposium on Discrete Algorithms, Austin, TX, January 1993]

    Optimal Prediction for Prefetching in the Worst Case

    Get PDF
    AMS subject classi cations. 68Q25, 68T05, 68P20, 68N25, 60J20 PII. S0097539794261817Response time delays caused by I/O are a major problem in many systems and database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, where the predictions must be accurate as well as made in a computationally e cient way. Unlike other online problems, prefetching cannot admit a competitive analysis, since the optimal o ine prefetcher incurs no cost when it knows the future page requests. Previous analytical work on prefetching [J. Assoc. Comput. Mach., 143 (1996), pp. 771{793] consisted of modeling the user as a probabilistic Markov source. In this paper, we look at the much stronger form of worst-case analysis and derive a randomized algorithm for pure prefetching. We compare our algorithm for every page request sequence with the important class of nite state prefetchers, making no assumptions as to how the sequence of page requests is generated. We prove analytically that the fault rate of our online prefetching algorithm converges almost surely for every page request sequence to the fault rate of the optimal nite state prefetcher for the sequence. This analysis model can be looked upon as a generalization of the com- petitive framework, in that it compares an online algorithm in a worst-case manner over all sequences with a powerful yet nonclairvoyant opponent. We simultaneously achieve the computational goal of implementing our prefetcher in optimal constant expected time per prefetched page using the optimal dynamic discrete random variate generator of Matias, Vitter, and Ni [Proc. 4th Annual SIAM/ACM Symposium on Discrete Algorithms, Austin, TX, January 1993]

    Highly intensive data dissemination in complex networks

    Full text link
    This paper presents a study on data dissemination in unstructured Peer-to-Peer (P2P) network overlays. The absence of a structure in unstructured overlays eases the network management, at the cost of non-optimal mechanisms to spread messages in the network. Thus, dissemination schemes must be employed that allow covering a large portion of the network with a high probability (e.g.~gossip based approaches). We identify principal metrics, provide a theoretical model and perform the assessment evaluation using a high performance simulator that is based on a parallel and distributed architecture. A main point of this study is that our simulation model considers implementation technical details, such as the use of caching and Time To Live (TTL) in message dissemination, that are usually neglected in simulations, due to the additional overhead they cause. Outcomes confirm that these technical details have an important influence on the performance of dissemination schemes and that the studied schemes are quite effective to spread information in P2P overlay networks, whatever their topology. Moreover, the practical usage of such dissemination mechanisms requires a fine tuning of many parameters, the choice between different network topologies and the assessment of behaviors such as free riding. All this can be done only using efficient simulation tools to support both the network design phase and, in some cases, at runtime
    corecore