14 research outputs found

    Retrieval and Perfect Hashing Using Fingerprinting

    Get PDF

    09491 Abstracts Collection -- Graph Search Engineering

    Get PDF
    From the 29th November to the 4th December 2009, the Dagstuhl Seminar 09491 ``Graph Search Engineering \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    A Compact Cache-Efficient Function Store with Constant Evaluation Time

    Get PDF
    A new data structure to store a set of key-value mappings for finite static key sets is presented. The data structure, which is called Cache-Efficient Function Stores (CEFS), can be built in linear expected time and supports evaluation for a key within worst-case constant time. Furthermore, (i) the building process can be parallelized to achieve massive speed-up over known methods; (ii) an evaluation needs less than two cache misses in average case for many applications, improving upon all known methods. The data structure is also compact, needing only O(n) bits extra space to be stored. The data structure is flexible in that there are many parameters that can be configured in order to fit in specific applications. The time and space properties for different parameters can be predicted to great precision in advance with formulae developed in the thesis. It is also possible to automate the selection of parameters. Experiments have shown the efficiency of the new data structure and confirmed the theoretical analysis

    Synthèse de textures par l’exemple pour les applications interactives

    Get PDF
    Millions of individuals explore virtual worlds every day, for entertainment, training, or to plan business trips and vacations. Video games such as Eve Online, World of Warcraft, and many others popularized their existence. Sand boxes such as Minecraft and Second Life illustrated how they can serve as a media, letting people create, share and even sell their virtual productions. Navigation and exploration software such as Google Earth and Virtual Earth let us explore a virtual version of the real world, and let us enrich it with information shared between the millions of users using these services every day.Virtual environments are massive, dynamic 3D scenes, that are explored and manipulated interactively bythousands of users simultaneously. Many challenges have to be solved to achieve these goals. Among those lies the key question of content management. How can we create enough detailed graphical content so as to represent an immersive, convincing and coherent world? Even if we can produce this data, how can we then store the terra–bytes it represents, and transfer it for display to each individual users? Rich virtual environments require a massive amount of varied graphical content, so as to represent an immersive, convincing and coherent world. Creating this content is extremely time consuming for computer artists and requires a specific set of technical skills. Capturing the data from the real world can simplify this task but then requires a large quantity of storage, expensive hardware and long capture campaigns. While this is acceptable for important landmarks (e.g. the statue of Liberty in New York, the Eiffel tower in Paris) this is wasteful on generic or anonymous landscapes. In addition, in many cases capture is not an option, either because an imaginary scenery is required or because the scene to be represented no longer exists. Therefore, researchers have proposed methods to generate new content programmatically, using captured data as an example. Typically, building blocks are extracted from the example content and re–assembled to form new assets. Such approaches have been at the center of my research for the past ten years. However, algorithms for generating data programmatically only partially address the content management challenge: the algorithm generates content as a (slow) pre–process and its output has to be stored for later use. On the contrary, I have focused on proposing models and algorithms which can produce graphical content while minimizing storage. The content is either generated when it is needed for the current viewpoint, or is produced under a very compact form that can be later used for rendering. Thanks to such approaches developers gain time during content creation, but this also simplifies the distribution of the content by reducing the required data bandwidth.In addition to the core problem of content synthesis, my approaches required the development of new data-structures able to store sparse data generated during display, while enabling an efficient access. These data-structures are specialized for the massive parallelism of graphics processors. I contributed early in this domain and kept a constant focus on this area. The originality of my approach has thus been to consider simultaneously the problems of generating, storing and displaying the graphical content. As we shall see, each of these area involve different theoretical and technical backgrounds, that nicely complement each other in providing elegant solutions to content generation, management and display

    Practical Private Information Retrieval

    Get PDF
    In recent years, the subject of online privacy has been attracting much interest, especially as more Internet users than ever are beginning to care about the privacy of their online activities. Privacy concerns are even prompting legislators in some countries to demand from service providers a more privacy-friendly Internet experience for their citizens. These are welcomed developments and in stark contrast to the practice of Internet censorship and surveillance that legislators in some nations have been known to promote. The development of Internet systems that are able to protect user privacy requires private information retrieval (PIR) schemes that are practical, because no other efficient techniques exist for preserving the confidentiality of the retrieval requests and responses of a user from an Internet system holding unencrypted data. This thesis studies how PIR schemes can be made more relevant and practical for the development of systems that are protective of users' privacy. Private information retrieval schemes are cryptographic constructions for retrieving data from a database, without the database (or database administrator) being able to learn any information about the content of the query. PIR can be applied to preserve the confidentiality of queries to online data sources in many domains, such as online patents, real-time stock quotes, Internet domain names, location-based services, online behavioural profiling and advertising, search engines, and so on. In this thesis, we study private information retrieval and obtain results that seek to make PIR more relevant in practice than all previous treatments of the subject in the literature, which have been mostly theoretical. We also show that PIR is the most computationally efficient known technique for providing access privacy under realistic computation powers and network bandwidths. Our result covers all currently known varieties of PIR schemes. We provide a more detailed summary of our contributions below: Our first result addresses an existing question regarding the computational practicality of private information retrieval schemes. We show that, unlike previously argued, recent lattice-based computational PIR schemes and multi-server information-theoretic PIR schemes are much more computationally efficient than a trivial transfer of the entire PIR database from the server to the client (i.e., trivial download). Our result shows the end-to-end response times of these schemes are one to three orders of magnitude (10--1000 times) smaller than the trivial download of the database for realistic computation powers and network bandwidths. This result extends and clarifies the well-known result of Sion and Carbunar on the computational practicality of PIR. Our second result is a novel approach for preserving the privacy of sensitive constants in an SQL query, which improves substantially upon the earlier work. Specifically, we provide an expressive data access model of SQL atop of the existing rudimentary index- and keyword-based data access models of PIR. The expressive SQL-based model developed results in between 7 and 480 times improvement in query throughput than previous work. We then provide a PIR-based approach for preserving access privacy over large databases. Unlike previously published access privacy approaches, we explore new ideas about privacy-preserving constraint-based query transformations, offline data classification, and privacy-preserving queries to index structures much smaller than the databases. This work addresses an important open problem about how real systems can systematically apply existing PIR schemes for querying large databases. In terms of applications, we apply PIR to solve user privacy problem in the domains of patent database query and location-based services, user and database privacy problems in the domain of the online sales of digital goods, and a scalability problem for the Tor anonymous communication network. We develop practical tools for most of our techniques, which can be useful for adding PIR support to existing and new Internet system designs

    Large scale parallel state space search utilizing graphics processing units and solid state disks

    Get PDF
    The evolution of science is a double-track process composed of theoretical insights on the one hand and practical inventions on the other one. While in most cases new theoretical insights motivate hardware developers to produce systems following the theory, in some cases the shown hardware solutions force theoretical research to forecast the results to expect. Progress in computer science rely on two aspects, processing information and storing it. Improving one side without touching the other will evidently impose new problems without producing a real alternative solution to the problem. While decreasing the time to solve a challenge may provide a solution to long term problems it will fail in solving problems which require much storage. In contrast, increasing the available amount of space for information storage will definitively allow harder problems to be solved by offering enough time. This work studies two recent developments in the hardware to utilize them in the domain of graph searching. The trend to discontinue information storage on magnetic disks and use electronic media instead and the tendency to parallelize the computation to speed up information processing are analyzed. Storing information on rotating magnetic disk has become the standard way since a couple of years and has reached a point where the storage capacity can be seen as infinite due to the possibility of adding new drives instantly with low costs. However, while the possible storage capacity increases every year, the transferring speed does not. At the beginning of this work, solid state media appeared on the market, slowly suppressing hard disks in speed demanding applications. Today, when finishing this work solid state drives are replacing magnetic disks in mobile computing, and computing centers use them as caching media to increase information retrieving speed. The reason is the huge advantage in random access where the speed does not drop so significantly as with magnetic drives. While storing and retrieving huge amounts of information is one side of the medal, the other one is the processing speed. Here the trend from increasing the clock frequency of single processors stagnated in 2006 and the manufacturers started to combine multiple cores in one processor. While a CPU is a general purpose processor the manufacturers of graphics processing units (GPUs) encounter the challenge to perform the same computation for a large number of image points. Here, a parallelization offers huge advantages, so modern graphics cards have evolved to highly parallel computing instances with several hundreds of cores. The challenge is to utilize these processors in other domains than graphics processing. One of the vastly used tasks in computer science is search. Not only disciplines with an obvious search but also in software testing searching a graph is the crucial aspect. Strategies which enable to examine larger graphs, be it by reducing the number of considered nodes or by increasing the searching speed, have to be developed to battle the rising challenges. This work enhances searching in multiple scientific domains like explicit state Model Checking, Action Planning, Game Solving and Probabilistic Model Checking proposing strategies to find solutions for the search problems. Providing an universal search strategy which can be used in all environments to utilize solid state media and graphics processing units is not possible due to the heterogeneous aspects of the domains. Thus, this work presents a tool kit of strategies tied together in an universal three stage strategy. In the first stage the edges leaving a node are determined, in the second stage the algorithm follows the edges to generate nodes. The duplicate detection in stage three compares all newly generated nodes to existing once and avoids multiple expansions. For each stage at least two strategies are proposed and decision hints are given to simplify the selection of the proper strategy. After describing the strategies the kit is evaluated in four domains explaining the choice for the strategy, evaluating its outcome and giving future clues on the topic
    corecore