12,210 research outputs found

    Massively Parallel Approximate Distance Sketches

    Get PDF
    Data structures that allow efficient distance estimation (distance oracles, distance sketches, etc.) have been extensively studied, and are particularly well studied in centralized models and classical distributed models such as CONGEST. We initiate their study in newer (and arguably more realistic) models of distributed computation: the Congested Clique model and the Massively Parallel Computation (MPC) model. We provide efficient constructions in both of these models, but our core results are for MPC. In MPC we give two main results: an algorithm that constructs stretch/space optimal distance sketches but takes a (small) polynomial number of rounds, and an algorithm that constructs distance sketches with worse stretch but that only takes polylogarithmic rounds. Along the way, we show that other useful combinatorial structures can also be computed in MPC. In particular, one key component we use to construct distance sketches are an MPC construction of the hopsets of [Elkin and Neiman, 2016]. This result has additional applications such as the first polylogarithmic time algorithm for constant approximate single-source shortest paths for weighted graphs in the low memory MPC setting

    Semantic Storage: Overview and Assessment

    No full text
    The Semantic Web has a great deal of momentum behind it. The promise of a ‘better web’, where information is given well defined meaning and computers are better able to work with it has captured the imagination of a significant number of people, particularly in academia. Language standards such as RDF and OWL have appeared with remarkable speed, and development continues apace. To back up this development, there is a requirement for ‘semantic databases’, where this data can be conveniently stored, operated upon, and retrieved. These already exist in the form of triple stores, but do not yet fulfil all the requirements that may be made of them, particularly in the area of performing inference using OWL. This paper analyses the current stores along with forthcoming technology, and finds that it is unlikely that a combination of speed, scalability, and complex inferencing will be practical in the immediate future. It concludes by suggesting alternative development routes

    TopSig: Topology Preserving Document Signatures

    Get PDF
    Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
    • …
    corecore