9 research outputs found

    Retrieval and Perfect Hashing Using Fingerprinting

    Get PDF

    09491 Abstracts Collection -- Graph Search Engineering

    Get PDF
    From the 29th November to the 4th December 2009, the Dagstuhl Seminar 09491 ``Graph Search Engineering \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    A Compact Cache-Efficient Function Store with Constant Evaluation Time

    Get PDF
    A new data structure to store a set of key-value mappings for finite static key sets is presented. The data structure, which is called Cache-Efficient Function Stores (CEFS), can be built in linear expected time and supports evaluation for a key within worst-case constant time. Furthermore, (i) the building process can be parallelized to achieve massive speed-up over known methods; (ii) an evaluation needs less than two cache misses in average case for many applications, improving upon all known methods. The data structure is also compact, needing only O(n) bits extra space to be stored. The data structure is flexible in that there are many parameters that can be configured in order to fit in specific applications. The time and space properties for different parameters can be predicted to great precision in advance with formulae developed in the thesis. It is also possible to automate the selection of parameters. Experiments have shown the efficiency of the new data structure and confirmed the theoretical analysis

    External perfect hashing for very large key sets

    No full text

    External Perfect Hashing for Very Large Key Sets

    No full text
    We present a simple and efficient external perfect hashing scheme (referred to as EPH algorithm) for very large static key sets. We use a number of techniques from the literature to obtain a novel scheme that is theoretically wellunderstood and at the same time achieves an order-of-magnitude increase in the size of the problem to be solved compared to previous “practical ” methods. We demonstrate the scalability of our algorithm by constructing minimum perfect hash functions for a set of 1.024 billion URLs from the World Wide Web of average length 64 characters in approximately 62 minutes, using a commodity PC. Our scheme produces minimal perfect hash functions using approximately 3.8 bits per key. For perfect hash functions in the range {0,..., 2n − 1} the space usage drops to approximately 2.7 bits per key. The main contribution is the first algorithm that has experimentally proven practicality for sets in the order of billions of keys and has time and space usage carefully analyzed without unrealistic assumptions

    Synthèse de textures par l’exemple pour les applications interactives

    Get PDF
    Millions of individuals explore virtual worlds every day, for entertainment, training, or to plan business trips and vacations. Video games such as Eve Online, World of Warcraft, and many others popularized their existence. Sand boxes such as Minecraft and Second Life illustrated how they can serve as a media, letting people create, share and even sell their virtual productions. Navigation and exploration software such as Google Earth and Virtual Earth let us explore a virtual version of the real world, and let us enrich it with information shared between the millions of users using these services every day.Virtual environments are massive, dynamic 3D scenes, that are explored and manipulated interactively bythousands of users simultaneously. Many challenges have to be solved to achieve these goals. Among those lies the key question of content management. How can we create enough detailed graphical content so as to represent an immersive, convincing and coherent world? Even if we can produce this data, how can we then store the terra–bytes it represents, and transfer it for display to each individual users? Rich virtual environments require a massive amount of varied graphical content, so as to represent an immersive, convincing and coherent world. Creating this content is extremely time consuming for computer artists and requires a specific set of technical skills. Capturing the data from the real world can simplify this task but then requires a large quantity of storage, expensive hardware and long capture campaigns. While this is acceptable for important landmarks (e.g. the statue of Liberty in New York, the Eiffel tower in Paris) this is wasteful on generic or anonymous landscapes. In addition, in many cases capture is not an option, either because an imaginary scenery is required or because the scene to be represented no longer exists. Therefore, researchers have proposed methods to generate new content programmatically, using captured data as an example. Typically, building blocks are extracted from the example content and re–assembled to form new assets. Such approaches have been at the center of my research for the past ten years. However, algorithms for generating data programmatically only partially address the content management challenge: the algorithm generates content as a (slow) pre–process and its output has to be stored for later use. On the contrary, I have focused on proposing models and algorithms which can produce graphical content while minimizing storage. The content is either generated when it is needed for the current viewpoint, or is produced under a very compact form that can be later used for rendering. Thanks to such approaches developers gain time during content creation, but this also simplifies the distribution of the content by reducing the required data bandwidth.In addition to the core problem of content synthesis, my approaches required the development of new data-structures able to store sparse data generated during display, while enabling an efficient access. These data-structures are specialized for the massive parallelism of graphics processors. I contributed early in this domain and kept a constant focus on this area. The originality of my approach has thus been to consider simultaneously the problems of generating, storing and displaying the graphical content. As we shall see, each of these area involve different theoretical and technical backgrounds, that nicely complement each other in providing elegant solutions to content generation, management and display

    Practical Private Information Retrieval

    Get PDF
    In recent years, the subject of online privacy has been attracting much interest, especially as more Internet users than ever are beginning to care about the privacy of their online activities. Privacy concerns are even prompting legislators in some countries to demand from service providers a more privacy-friendly Internet experience for their citizens. These are welcomed developments and in stark contrast to the practice of Internet censorship and surveillance that legislators in some nations have been known to promote. The development of Internet systems that are able to protect user privacy requires private information retrieval (PIR) schemes that are practical, because no other efficient techniques exist for preserving the confidentiality of the retrieval requests and responses of a user from an Internet system holding unencrypted data. This thesis studies how PIR schemes can be made more relevant and practical for the development of systems that are protective of users' privacy. Private information retrieval schemes are cryptographic constructions for retrieving data from a database, without the database (or database administrator) being able to learn any information about the content of the query. PIR can be applied to preserve the confidentiality of queries to online data sources in many domains, such as online patents, real-time stock quotes, Internet domain names, location-based services, online behavioural profiling and advertising, search engines, and so on. In this thesis, we study private information retrieval and obtain results that seek to make PIR more relevant in practice than all previous treatments of the subject in the literature, which have been mostly theoretical. We also show that PIR is the most computationally efficient known technique for providing access privacy under realistic computation powers and network bandwidths. Our result covers all currently known varieties of PIR schemes. We provide a more detailed summary of our contributions below: Our first result addresses an existing question regarding the computational practicality of private information retrieval schemes. We show that, unlike previously argued, recent lattice-based computational PIR schemes and multi-server information-theoretic PIR schemes are much more computationally efficient than a trivial transfer of the entire PIR database from the server to the client (i.e., trivial download). Our result shows the end-to-end response times of these schemes are one to three orders of magnitude (10--1000 times) smaller than the trivial download of the database for realistic computation powers and network bandwidths. This result extends and clarifies the well-known result of Sion and Carbunar on the computational practicality of PIR. Our second result is a novel approach for preserving the privacy of sensitive constants in an SQL query, which improves substantially upon the earlier work. Specifically, we provide an expressive data access model of SQL atop of the existing rudimentary index- and keyword-based data access models of PIR. The expressive SQL-based model developed results in between 7 and 480 times improvement in query throughput than previous work. We then provide a PIR-based approach for preserving access privacy over large databases. Unlike previously published access privacy approaches, we explore new ideas about privacy-preserving constraint-based query transformations, offline data classification, and privacy-preserving queries to index structures much smaller than the databases. This work addresses an important open problem about how real systems can systematically apply existing PIR schemes for querying large databases. In terms of applications, we apply PIR to solve user privacy problem in the domains of patent database query and location-based services, user and database privacy problems in the domain of the online sales of digital goods, and a scalability problem for the Tor anonymous communication network. We develop practical tools for most of our techniques, which can be useful for adding PIR support to existing and new Internet system designs
    corecore