4,303 research outputs found

    Optimal Content Placement for En-Route Web Caching

    Get PDF
    This paper studies the optimal placement of web files for en-route web caching. It is shown that existing placement policies are all solving restricted partial problems of the file placement problem, and therefore give only sub-optimal solutions. A dynamic programming algorithm of low complexity which computes the optimal solution is presented. It is shown both analytically and experimentally that the file-placement solution output by our algorithm outperforms existing en-route caching policies. The optimal placement of web files can be implemented with a reasonable level of cache coordination and management overhead for en-route caching; and importantly, it can be achieved with or without using data prefetching

    Stochastic Query Covering for Fast Approximate Document Retrieval

    Get PDF
    We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high-quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of Web search. In the first case, we favor the retrieval of documents that are relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios

    I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis

    Full text link
    Revelations of large scale electronic surveillance and data mining by governments and corporations have fueled increased adoption of HTTPS. We present a traffic analysis attack against over 6000 webpages spanning the HTTPS deployments of 10 widely used, industry-leading websites in areas such as healthcare, finance, legal services and streaming video. Our attack identifies individual pages in the same website with 89% accuracy, exposing personal details including medical conditions, financial and legal affairs and sexual orientation. We examine evaluation methodology and reveal accuracy variations as large as 18% caused by assumptions affecting caching and cookies. We present a novel defense reducing attack accuracy to 27% with a 9% traffic increase, and demonstrate significantly increased effectiveness of prior defenses in our evaluation context, inclusive of enabled caching, user-specific cookies and pages within the same website
    corecore