5 research outputs found

    Efficient evaluation of generalized path pattern queries on XML data

    Full text link
    Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path-patterns or tree-patterns. Requirements in flexible querying of XML data have motivated recently the introduction of query languages that allow a partial specification of path-patterns in a query. In this paper, we focus on the efficient evaluation of partial path queries, a generalization of path pattern queries. Our approach explicitly deals with repeated labels (that is, multiple occurrences of the same label in a query). We show that partial path queries can be represented as rooted dags for which a topological ordering of the nodes exists. We present three algorithms for the efficient evaluation of these queries under the indexed streaming evaluation model. The first one exploits a structural summary of data to generate a set of path-patterns that together are equivalent to a partial path query. To evaluate these path-patterns, we extend PathStack so that it can work on path-patterns with repeated labels. The second one extracts a spanning tree from the query dag, uses a stack-based algorithm to find the matches of the root-to-leaf paths in the tree, and merge-joins the matches to compute the answer. Finally, the third one exploits multiple pointers of stack entries and a topological ordering of the query dag to apply a stack-based holistic technique. An analysis of the algorithms and extensive experimental evaluation shows that the holistic algorithm outperforms the other ones

    Twig Pattern Search in XML Database

    Get PDF
    For current search engine, we got results ranked by popularity. However, the most popular topics are not always I want. Millions people have millions different favors. So, the main challenge is how to dig the information up from the tremendous database of Internet according to different people's favor. In computer science, "favor" is pattern. We call it "Twig Pattern Search". Unlike index methods that split a query into several sub-queries, and then stick the results together to provide the final answers, twig pattern search uses tree structures as the master unit of query to avoid expensive join operations. We present an efficient algorithm for tree mapping problem in XML database. Given a target tree T and a pattern tree Q, the algorithm can find all the embeddings of Q in T in O (|D||Q|) time, where D is the largest data stream associated with a node of Q.Master of Science in Applied Computer Scienc

    On the optimality of holistic algorithms for twig queries

    No full text
    Streaming XML documents has many emerging applications. However, in this paper, we show that the restrictions imposed by data streaming are too restrictive for processing twig queries - the core operation for XML query processing. Previous proposed algorithm TwigStack is an optimal algorithm for processing twig queries with only descendent edges over streams of nodes. The cause of the suboptimality of the TwigStack algorithm is the structurally recursions appearing in XML documents. We show that without relaxing the data streaming model, it is not possible to develop an optimal holistic algorithm for twig queries. Also the computation of the twig queries is not memory bounded. This motivates us to study two variations of the data streaming model: (1) offline sorting is allowed and the algorithm is allowed to select the correct nodes to be streamed and (2) multiple scans on the data streams are allowed. We show the lower bounds of the two variations
    corecore