2,866 research outputs found

    XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme

    Full text link
    Query evaluation in an XML database requires reconstructing XML subtrees rooted at nodes found by an XML query. Since XML subtree reconstruction can be expensive, one approach to improve query response time is to use reconstruction views - materialized XML subtrees of an XML document, whose nodes are frequently accessed by XML queries. For this approach to be efficient, the principal requirement is a framework for view selection. In this work, we are the first to formalize and study the problem of XML reconstruction view selection. The input is a tree TT, in which every node ii has a size cic_i and profit pip_i, and the size limitation CC. The target is to find a subset of subtrees rooted at nodes i1,⋯ ,iki_1,\cdots, i_k respectively such that ci1+⋯+cik≤Cc_{i_1}+\cdots +c_{i_k}\le C, and pi1+⋯+pikp_{i_1}+\cdots +p_{i_k} is maximal. Furthermore, there is no overlap between any two subtrees selected in the solution. We prove that this problem is NP-hard and present a fully polynomial-time approximation scheme (FPTAS) as a solution

    A Join Index for XML Data Warehouses

    Get PDF
    XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this paper, we propose a new join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. A theoretical study and experimental results demonstrate the efficiency of our join index. They also show that native XML DBMSs can compete with XML-compatible, relational DBMSs when warehousing and analyzing XML data.Comment: 2008 International Conference on Information Resources Management (Conf-IRM 08), Niagra Falls : Canada (2008

    Efficient XML Keyword Search based on DAG-Compression

    Full text link
    In contrast to XML query languages as e.g. XPath which require knowledge on the query language as well as on the document structure, keyword search is open to anybody. As the size of XML sources grows rapidly, the need for efficient search indices on XML data that support keyword search increases. In this paper, we present an approach of XML keyword search which is based on the DAG of the XML data, where repeated substructures are considered only once, and therefore, have to be searched only once. As our performance evaluation shows, this DAG-based extension of the set intersection search algorithm[1], [2], can lead to search times that are on large documents more than twice as fast as the search times of the XML-based approach. Additionally, we utilize a smaller index, i.e., we consume less main memory to compute the results

    Translation of Heterogeneous Databases into RDF, and Application to the Construction of a SKOS Taxonomical Reference

    Get PDF
    International audienceWhile the data deluge accelerates, most of the data produced remains locked in deep Web databases. For the linked open data to benefit from the potential represented by this huge amount of data, it is crucial to come up with solutions to expose heterogeneous databases as linked data. The xR2RML mapping language is an endeavor towards this goal: it is designed to map various types of databases to RDF, by flexibly adapting to heterogeneous query languages and data models while remaining free from any specific language. It extends R2RML, the W3C recommendation for the mapping of relational databases to RDF, and relies on RML for the handling of various data formats. In this paper we present xR2RML, we analyse data models of several modern databases as well as the format in which query results are returned , and we show how xR2RML translates any result data element into RDF, relying on existing languages such as XPath and JSONPath when necessary. We illustrate some features of xR2RML such as the generation of RDF collections and containers, and the ability to deal with mixed data formats. We also describe a real-world use case in which we applied xR2RML to build a SKOS thesaurus aimed at supporting studies on History of Zoology, Archaeozoology and Conservation Biology

    Efficient Incremental Breadth-Depth XML Event Mining

    Full text link
    Many applications log a large amount of events continuously. Extracting interesting knowledge from logged events is an emerging active research area in data mining. In this context, we propose an approach for mining frequent events and association rules from logged events in XML format. This approach is composed of two-main phases: I) constructing a novel tree structure called Frequency XML-based Tree (FXT), which contains the frequency of events to be mined; II) querying the constructed FXT using XQuery to discover frequent itemsets and association rules. The FXT is constructed with a single-pass over logged data. We implement the proposed algorithm and study various performance issues. The performance study shows that the algorithm is efficient, for both constructing the FXT and discovering association rules
    • …
    corecore