7,690 research outputs found

    Extracting partition statistics from semistructured data

    Get PDF
    The effective grouping, or partitioning, of semistructured data is of fundamental importance when providing support for queries. Partitions allow items within the data set that share common structural properties to be identified efficiently. This allows queries that make use of these properties, such as branching path expressions, to be accelerated. Here, we evaluate the effectiveness of several partitioning techniques by establishing the number of partitions that each scheme can identify over a given data set. In particular, we explore the use of parameterised indexes, based upon the notion of forward and backward bisimilarity, as a means of partitioning semistructured data; demonstrating that even restricted instances of such indexes can be used to identify the majority of relevant partitions in the data

    Fully Dynamic Connectivity in O(logn(loglogn)2)O(\log n(\log\log n)^2) Amortized Expected Time

    Full text link
    Dynamic connectivity is one of the most fundamental problems in dynamic graph algorithms. We present a randomized Las Vegas dynamic connectivity data structure with O(logn(loglogn)2)O(\log n(\log\log n)^2) amortized expected update time and O(logn/logloglogn)O(\log n/\log\log\log n) worst case query time, which comes very close to the cell probe lower bounds of Patrascu and Demaine (2006) and Patrascu and Thorup (2011)

    Semantic Visual Localization

    Full text link
    Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, e.g., in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method leverages a novel generative model for descriptor learning, trained on semantic scene completion as an auxiliary task. The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes

    Content-Aware DataGuides for Indexing Large Collections of XML Documents

    Get PDF
    XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents

    A Bi-Labeling Based XPath Processing System

    Get PDF
    We present BLAS, a Bi-LAbeling based XPath processing System. BLAS uses two labeling schemes to speed up query processing: P-labeling for processing consecutive child (or parent) axis traversals, and D-labeling for processing descendant (or ancestor) axis traversals. XML data are stored in labeled form and indexed. Algorithms are presented for translating XPath queries to SQL expressions. BLAS reduces the number of joins in the SQL query translated from a given XPath query and reduces the number of disk accesses required to execute the SQL query compared with the traditional XPath processing using D-labeling alone. We also propose an approximate P-labeling scheme and the corresponding query translation algorithm to handle XML data trees that contain a large number of distinct tag names, and/or are very deep. This extension captures a spectrum of XPath-to-SQL query translation schemes, ranging from existing schemes that do not use P-labels to the one that uses exact P-labels. Experimental results demonstrate the efficiency of the BLAS system

    Fast and Tiny Structural Self-Indexes for XML

    Full text link
    XML document markup is highly repetitive and therefore well compressible using dictionary-based methods such as DAGs or grammars. In the context of selectivity estimation, grammar-compressed trees were used before as synopsis for structural XPath queries. Here a fully-fledged index over such grammars is presented. The index allows to execute arbitrary tree algorithms with a slow-down that is comparable to the space improvement. More interestingly, certain algorithms execute much faster over the index (because no decompression occurs). E.g., for structural XPath count queries, evaluating over the index is faster than previous XPath implementations, often by two orders of magnitude. The index also allows to serialize XML results (including texts) faster than previous systems, by a factor of ca. 2-3. This is due to efficient copy handling of grammar repetitions, and because materialization is totally avoided. In order to compare with twig join implementations, we implemented a materializer which writes out pre-order numbers of result nodes, and show its competitiveness.Comment: 13 page

    Data complexity measured by principal graphs

    Full text link
    How to measure the complexity of a finite set of vectors embedded in a multidimensional space? This is a non-trivial question which can be approached in many different ways. Here we suggest a set of data complexity measures using universal approximators, principal cubic complexes. Principal cubic complexes generalise the notion of principal manifolds for datasets with non-trivial topologies. The type of the principal cubic complex is determined by its dimension and a grammar of elementary graph transformations. The simplest grammar produces principal trees. We introduce three natural types of data complexity: 1) geometric (deviation of the data's approximator from some "idealized" configuration, such as deviation from harmonicity); 2) structural (how many elements of a principal graph are needed to approximate the data), and 3) construction complexity (how many applications of elementary graph transformations are needed to construct the principal object starting from the simplest one). We compute these measures for several simulated and real-life data distributions and show them in the "accuracy-complexity" plots, helping to optimize the accuracy/complexity ratio. We discuss various issues connected with measuring data complexity. Software for computing data complexity measures from principal cubic complexes is provided as well.Comment: Computers and Mathematics with Applications, in pres
    corecore