5 research outputs found

    Efficient creation and incremental maintenance of the hopi index for complex xml document collections

    Get PDF
    The HOPI index, a connection index for XML documents based on the concept of a 2–hop cover, provides space – and time–efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in XML search engines. This paper presents enhanced algorithms for building HOPI, shows how to augment the index with distance information, and discusses incremental index maintenance. Our experiments show substantial improvements over the existing divide-and-conquer algorithm for index creation, low space overhead for including distance information in the index, and efficient updates

    Clustering-based Labelling Scheme - A Hybrid Approach for Efficient Querying and Updating XML Documents

    Get PDF
    Extensible Markup Language (XML) has become a dominant technology for transferring data through the worldwide web. The XML labelling schemes play a key role in handling XML data efficiently and robustly. Thus, many labelling schemes have been proposed. However, these labelling schemes have limitations and shortcomings. Thus, the aim of this research was to investigate the existing XML labelling schemes and their limitations in order to address the issue of efficiency of XML query performance. This thesis investigated the existing labelling schemes and classified them into three categories based on certain criteria, in order to identify the limitations and challenges of these labelling schemes. Based on the outcomes of this investigation, this thesis proposed a state-of-theart labelling scheme, called clustering-based labelling scheme, to resolve or improve the key limitations such as the efficiency of the XML query processing, labelling XML nodes, and XML updates cost. This thesis argued that using certain existing labelling schemes to label nodes, and using the clustering-based techniques can improve query and labelling nodes efficiency. Theoretically, the proposed scheme is based on dividing the nodes of an XML document into clusters. Two existing labelling schemes, which are the Dewey and LLS labelling schemes, were selected for labelling these clusters and their nodes. Subsequently, the proposed scheme was designed and implemented. In addition, the Dewey and LLS labelling scheme were implemented for the purpose of evaluating the proposed scheme. Subsequently, four experiments were designed in order to test the proposed scheme against the Dewey and LLS labelling schemes. The results of these experiments suggest that the proposed scheme achieved better results than the Dewey and LLS schemes. Consequently, the research hypothesis was accepted overall with few exceptions, and the proposed scheme showed an improvement in the performance and all the targeted features and aspects

    Labelling Dynamic XML Documents: A GroupBased Approach

    Get PDF
    Documents that comply with the XML standard are characterised by inherent ordering and their modelling usually takes the form of a tree. Nowadays, applications generate massive amounts of XML data, which requires accurate and efficient query-able XML database systems. XML querying depends on XML labelling in much the same way as relational databases rely on indexes. Document order and structural information are encoded by labelling schemes, thus facilitating their use by queries without having to access the original XML document. Dynamic XML data, data which changes, complicates the labelling scheme. As demonstrated by much research efforts, it is difficult to allocate unique labels to nodes in a dynamic XML tree so that all structural relationships between the nodes are encoded by the labels. Static XML documents are generally managed with labelling schemes that use simple labels. By contrast, dynamic labelling schemes have extra labelling costs and lower query performance to allow random updates irrespective of the document update frequency. Given that static and dynamic XML documents are often not clearly distinguished, a labelling scheme whose efficiency does not depend on updating frequency would be useful. The GroupBased labelling scheme proposed in this thesis is compatible with static as well as dynamic XML documents. In particular, this scheme has a high performance in processing dynamic XML data updates. What differentiates it from other dynamic labelling schemes is its uniform behaviour irrespective of whether the document is static or dynamic, ability to determine all structural relationships between nodes, and the improved query performance in both types of document. The advantages of the GroupBased scheme in comparison to earlier schemes are highlighted by the experiment results

    Die XXL—Suchmaschine zur ontologiebasierten Ähnlichkeitssuche in XML—Dokumenten

    Get PDF
    Die effektive und effiziente Informationssuche in großen Mengen semistrukturierter Daten im XML-Format stehen im Mittelpunkt dieser Arbeit. In dieser Arbeit wird die XXL-Suchmaschine vorgestellt. Sie wertet Anfragen aus, die in der XML-Anfragesprache XXL formuliert sind. Eine XXL-Anfrage umfasst dabei Suchbedingungen an die Struktur und an den Inhalt von XML-Dokumenten. Als Ergebnis wird eine nach ihrer Relevanz absteigend sortierte Liste von Treffern produziert, wobei ein Treffer ein relevantes XML-Dokument oder nur der relevante Teil eines XML-Dokuments sein kann. Die relevanzorientierte Auswertung von gegebenen Suchbedingungen beruht zum einen auf Verfahren aus dem Vektorraummodell und zum anderen wird semantisches Wissen einer quantifizierten Ontologie hinzugezogen. Zu diesem Zweck werden Datenbank-Technologien und Verfahren aus dem Information Retrieval kombiniert, um die Qualität der Suchergebnisse im Vergleich zur traditionellen Stichwortsuche in Textdokumenten zu verbessern. Die hier vorgestellten Konzepte wurden in einem Prototypen implementiert und umfangreich evaluiert.The effective and efficient information retrieval in large sets of semistructured data using the XML format is the main theme of this thesis. This thesis presents the XXL search engine, which executes queries formulated in the XML query language XXL. An XXL query consists of search conditions on the structure and search conditions on the content of XML documents. The result is a ranked result list in descending order of relevance, where a result can be a relevant XML document or only the relevant part of an XML document. The relevance-based query evaluation uses methods from the vector space model and semantic knowledge from a quantified ontology. For this purpose, we combine database technologies and methods from information retrieval to improve the quality of search results in comparison to traditional keyword-based text retrieval. The presented concepts have been implemented and exhaustively evaluated

    Dynamic Range Labeling for XML Trees

    No full text
    corecore