530 research outputs found

    SCOOTER: A compact and scalable dynamic labeling scheme for XML updates

    Get PDF
    Although dynamic labeling schemes for XML have been the focus of recent research activity, there are significant challenges still to be overcome. In particular, though there are labeling schemes that ensure a compact label representation when creating an XML document, when the document is subject to repeated and arbitrary deletions and insertions, the labels grow rapidly and consequently have a significant impact on query and update performance. We review the outstanding issues todate and in this paper we propose SCOOTER - a new dynamic labeling scheme for XML. The new labeling scheme can completely avoid relabeling existing labels. In particular, SCOOTER can handle frequently skewed insertions gracefully. Theoretical analysis and experimental results confirm the scalability, compact representation, efficient growth rate and performance of SCOOTER in comparison to existing dynamic labeling schemes

    A compact and scalable encoding for updating XML based on node labeling schemes

    Get PDF
    The eXtensible Markup Language (XML) has been adopted as the new standard for data exchange on the World Wide Web. As the rate of adoption increases, there is an ever pressing need to store, query and update XML in its native format, thereby eliminating the overhead of parsing and transforming XML in and out of various data formats. However, the hierarchical, ordered and semi-structured properties of the tree structure underlying the XML data model presents many challenges to updating XML. In particular, many of the tree labeling schemes were designed to solve a particular problem or provide a particular feature, often at the expense of other important features. In this dissertation, we identify the core properties that are representative of the desirable characteristics of a good dynamic labeling scheme for XML. We focus on four features central to the outstanding problems in existing dynamic labeling schemes; namely a compact label encoding, scalability, deleted node label reuse and a label storage scheme for binary-encoded bit-string node labels. At present there is no dynamic labeling scheme that integrates support for all four features. We present a novel compact and scalable adaptive encoding method to facilitate a highly constrained growth rate of label size under arbitrary node insertion and deletion scenarios and our encoding method can scale efficiently. We deploy our encoding method in two novel dynamic labeling schemes for XML that can completely avoid node relabeling, process frequently skewed insertions gracefully and reuse deleted node labels

    Research on Labeling Schemes over Dynamic XML Data

    Get PDF
    随着网络应用的快速发展,XML(eXtensibleMarkupLanguage)数据正成为主流的数据形式,如何对XML数据建立有效索引进而实现高效查询是当前的研究热点。大部分XML相关索引和查询技术基于某种对XML树的编码方法。XML编码方法保存了文档树的结构信息,使得在执行查询时不必遍历整个XML文档。传统的区间编码方法和前缀编码方法支持XML节点间位置关系和结构关系计算,但是不能有效处理文档更新,一旦更新发生,整个树需要重新编码,系统代价高。为解决该问题,研究人员提出了动态XML编码方法,包括浮点数区间、CDBS(CompactDynamicBinaryString)、QED(Dynam...Along with the increasing development of Internet-based application, more and more information is being stored, exchanged and presented in XML format. The ability to efficiently index and query XML data sources become increasingly important. Most of XML indexing and querying techniques are based on labeling schemes which are designed to label the XML nodes so that both ordered and un-ordered queri...学位:工学硕士院系专业:信息科学与技术学院计算机科学系_计算机软件与理论学号:2302009115271

    FibLSS: A scalable label storage scheme for dynamic XML updates

    Get PDF
    Dynamic labeling schemes for XML updates have been the focus of significant research activity in recent years. However the label storage schemes underpinning the dynamic labeling schemes have not received as much attention. Label storage schemes specify how labels are physically encoded and stored on disk. The size of the labels and their logical representation directly influence the computational costs of processing the labels and can limit the functionality provided by the dynamic labeling scheme to an XML update service. This has significant practical implications when merging XML repositories such as clinical studies. In this paper, we provide an overview of the existing label storage schemes. We present a novel label storage scheme based on the Fibonacci sequence that can completely avoid relabeling existing nodes under dynamic insertions. Theoretical analysis and experimental results confirm the scalability and performance of the Fibonacci label storage scheme in comparison to existing approaches

    Compressing Labels of Dynamic XML Data using Base-9 Scheme and Fibonacci Encoding

    Get PDF
    The flexibility and self-describing nature of XML has made it the most common mark-up language used for data representation over the Web. XML data is naturally modelled as a tree, where the structural tree information can be encoded into labels via XML labelling scheme in order to permit answers to queries without the need to access original XML files. As the transmission of XML data over the Internet has become vibrant, it has also become necessary to have an XML labelling scheme that supports dynamic XML data. For a large-scale and frequently updated XML document, existing dynamic XML labelling schemes still suffer from high growth rates in terms of their label size, which can result in overflow problems and/or ambiguous data/query retrievals. This thesis considers the compression of XML labels. A novel XML labelling scheme, named “Base-9”, has been developed to generate labels that are as compact as possible and yet provide efficient support for queries to both static and dynamic XML data. A Fibonacci prefix-encoding method has been used for the first time to store Base-9’s XML labels in a compressed format, with the intention of minimising the storage space without degrading XML querying performance. The thesis also investigates the compression of XML labels using various existing prefix-encoding methods. This investigation has resulted in the proposal of a novel prefix-encoding method named “Elias-Fibonacci of order 3”, which has achieved the fastest encoding time of all prefix-encoding methods studied in this thesis, whereas Fibonacci encoding was found to require the minimum storage. Unlike current XML labelling schemes, the new Base-9 labelling scheme ensures the generation of short labels even after large, frequent, skewed insertions. The advantages of such short labels as those generated by the combination of applying the Base-9 scheme and the use of Fibonacci encoding in terms of storing, updating, retrieving and querying XML data are supported by the experimental results reported herein

    A Labelling Technique Comparison for Indexing Large XML Database

    Get PDF
    The flexibility nature of XML documents has motivated researchers to use it for data transmission and storage in different domains. The hierarchical structure of XML documents is an attractive point to be researched for processing a user query based on labelling where each label describes the node structure in the tree. In this study, three categories of XML node labelling will be analysed to address the open problem of each category. A number of experiments are executed to compare performance of time execution and storage space required for labelling XML tree

    Clustering-based Labelling Scheme - A Hybrid Approach for Efficient Querying and Updating XML Documents

    Get PDF
    Extensible Markup Language (XML) has become a dominant technology for transferring data through the worldwide web. The XML labelling schemes play a key role in handling XML data efficiently and robustly. Thus, many labelling schemes have been proposed. However, these labelling schemes have limitations and shortcomings. Thus, the aim of this research was to investigate the existing XML labelling schemes and their limitations in order to address the issue of efficiency of XML query performance. This thesis investigated the existing labelling schemes and classified them into three categories based on certain criteria, in order to identify the limitations and challenges of these labelling schemes. Based on the outcomes of this investigation, this thesis proposed a state-of-theart labelling scheme, called clustering-based labelling scheme, to resolve or improve the key limitations such as the efficiency of the XML query processing, labelling XML nodes, and XML updates cost. This thesis argued that using certain existing labelling schemes to label nodes, and using the clustering-based techniques can improve query and labelling nodes efficiency. Theoretically, the proposed scheme is based on dividing the nodes of an XML document into clusters. Two existing labelling schemes, which are the Dewey and LLS labelling schemes, were selected for labelling these clusters and their nodes. Subsequently, the proposed scheme was designed and implemented. In addition, the Dewey and LLS labelling scheme were implemented for the purpose of evaluating the proposed scheme. Subsequently, four experiments were designed in order to test the proposed scheme against the Dewey and LLS labelling schemes. The results of these experiments suggest that the proposed scheme achieved better results than the Dewey and LLS schemes. Consequently, the research hypothesis was accepted overall with few exceptions, and the proposed scheme showed an improvement in the performance and all the targeted features and aspects

    Labelling Dynamic XML Documents: A GroupBased Approach

    Get PDF
    Documents that comply with the XML standard are characterised by inherent ordering and their modelling usually takes the form of a tree. Nowadays, applications generate massive amounts of XML data, which requires accurate and efficient query-able XML database systems. XML querying depends on XML labelling in much the same way as relational databases rely on indexes. Document order and structural information are encoded by labelling schemes, thus facilitating their use by queries without having to access the original XML document. Dynamic XML data, data which changes, complicates the labelling scheme. As demonstrated by much research efforts, it is difficult to allocate unique labels to nodes in a dynamic XML tree so that all structural relationships between the nodes are encoded by the labels. Static XML documents are generally managed with labelling schemes that use simple labels. By contrast, dynamic labelling schemes have extra labelling costs and lower query performance to allow random updates irrespective of the document update frequency. Given that static and dynamic XML documents are often not clearly distinguished, a labelling scheme whose efficiency does not depend on updating frequency would be useful. The GroupBased labelling scheme proposed in this thesis is compatible with static as well as dynamic XML documents. In particular, this scheme has a high performance in processing dynamic XML data updates. What differentiates it from other dynamic labelling schemes is its uniform behaviour irrespective of whether the document is static or dynamic, ability to determine all structural relationships between nodes, and the improved query performance in both types of document. The advantages of the GroupBased scheme in comparison to earlier schemes are highlighted by the experiment results

    Dynamic Containment Labeling Scheme for XML

    Get PDF
    提出了适用于XMl文档更新环境下的区间编码方法——dClS(dynAMIC COnTAInMEnT lAbElIng SCHEME).dClS将基于整数的编码泛化到基于向量的编码,扩展了传统静态区间编码方法,有效避免了XMl文档更新时的重新编码.不论文档更新与否,dClS都显示了良好的性能:dClS利用基于整数的静态区间编码方法进行初始编码,在文档不更新的环境下,具有较高的存储效率和查询性能;同时,dClS将整数视为特殊向量,不仅能够支持文档更新,而且更新效率高;特别是倾斜插入时,dClS可以避免编码位长的快速增加.实验结果表明,与已有的动态区间编码方法相比,dClS具有更好的性能.A novel containment scheme called DCLS is proposed to effectively process updates in dynamic XML data.DCLS generalizes the static containment scheme from integer order to vector order and thus completely avoids re-labeling when XML data updating.Moreover,DCLS is compact and efficient regardless of whether the documents are updated or not.On the one hand,DCLS uses integer-based static containment scheme for initial labeling,which yields compact size and excellent query efficiency for static documents.On the other hand,DCLS takes the integer as special vector,which not only deals with the case of document updating,but also achieves high query performance.Most importantly,DCLS can effectively avoid the rapid increase of labeling size for the case of skewed insertions.Experimental results confirm the benefits of this approach compared to previous dynamic containment schemes.国家自然科学基金(50604012);中央高校基本科研业务费专项资金(2011121049
    corecore