339 research outputs found

    CYCLOSA: Decentralizing Private Web Search Through SGX-Based Browser Extensions

    Get PDF
    By regularly querying Web search engines, users (unconsciously) disclose large amounts of their personal data as part of their search queries, among which some might reveal sensitive information (e.g. health issues, sexual, political or religious preferences). Several solutions exist to allow users querying search engines while improving privacy protection. However, these solutions suffer from a number of limitations: some are subject to user re-identification attacks, while others lack scalability or are unable to provide accurate results. This paper presents CYCLOSA, a secure, scalable and accurate private Web search solution. CYCLOSA improves security by relying on trusted execution environments (TEEs) as provided by Intel SGX. Further, CYCLOSA proposes a novel adaptive privacy protection solution that reduces the risk of user re- identification. CYCLOSA sends fake queries to the search engine and dynamically adapts their count according to the sensitivity of the user query. In addition, CYCLOSA meets scalability as it is fully decentralized, spreading the load for distributing fake queries among other nodes. Finally, CYCLOSA achieves accuracy of Web search as it handles the real query and the fake queries separately, in contrast to other existing solutions that mix fake and real query results

    Document replication strategies for geographically distributed web search engines

    Get PDF
    Cataloged from PDF version of article.Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine. (C) 2012 Elsevier Ltd. All rights reserved

    ASAP Top-k Query Processing in Unstructured P2P Systems

    Get PDF
    International audienceTop-k query processing techniques are useful in unstructured peer-to-peer (P2P) systems, to avoid overwhelming users with too many results. However, existing approaches suffer from long waiting times. This is because top-k results are returned only when all queried peers have finished processing the query. As a result, query response time is dominated by the slowest queried peer. In this paper, we address this users' waiting time problem. For this, we revisit top-k query processing in P2P systems by introducing two novel notions in addition to response time: the stabilization time and the cumulative quality gap. Using thèse notions, we formally define the as-soon-as-possible (ASAP) top-k processing problem. Then, we propose a family of algorithms called ASAP to deal with this problem. We validated our solution through implementation and extensive experimentation. The results show that ASAP significantly outperforms baseline algorithms by returning final top-k result to users in much better times

    Toward Self-Organising Service Communities

    Get PDF
    This paper discusses a framework in which catalog service communities are built, linked for interaction, and constantly monitored and adapted over time. A catalog service community (represented as a peer node in a peer-to-peer network) in our system can be viewed as domain specific data integration mediators representing the domain knowledge and the registry information. The query routing among communities is performed to identify a set of data sources that are relevant to answering a given query. The system monitors the interactions between the communities to discover patterns that may lead to restructuring of the network (e.g., irrelevant peers removed, new relationships created, etc.)

    A Collaborative Protocol for Private Retrieval of Location-Based Information

    Get PDF
    Privacy and security are paramount for the proper deployment of location-based services (LBSs). We present a novel protocol based on user collaboration to privately retrieve location-based information from an LBS provider. Our approach neither assumes that users or the LBS can be completely trusted with regard to privacy, nor relies on a trusted third party. In addition, user queries, containing accurate locations, remain unchanged, and the collaborative protocol does not impose any special requirements on the query-response function of the LBS. The protocol is analyzed in terms of privacy, network traffic, and LBS processing overhead. We show that our proposal provides exponential scalability in the probability of guaranteed privacy breach, at the expense of a linear relative network cost.Preprin
    corecore