15 research outputs found
Investigation into Indexing XML Data Techniques
The rapid development of XML technology improves the WWW, since the XML data has many advantages and has become a common technology for transferring data cross the internet. Therefore, the objective of this research is to investigate and study the XML indexing techniques in terms of their structures. The main goal of this investigation is to identify the main limitations of these techniques and any other open issues.
Furthermore, this research considers most common XML indexing techniques and performs a comparison between them. Subsequently, this work makes an argument to find out these limitations. To conclude, the main problem of all the XML indexing techniques is the trade-off between the
size and the efficiency of the indexes. So, all the indexes become large in order to perform well, and none of them is suitable for all users’ requirements. However, each one of these techniques has some advantages in somehow
Bridging XML and Relational Databases: An Effective Mapping Scheme based on Persistent
XML has emerged as the leading medium for data transfer over the World Wide Web. At the present days, relational database is still widely used as the back-end database in most organizations. Since there is mismatch in these two structures, an effective mapping scheme is definitely essential that provides seamless integration with relational databases. On the other hand, an immutable labeling scheme is certainly significant to dentify the XML nodes uniquely as well as supports dynamic update without having the existing labels to be re-labeled when there is an occurance of dynamic update. As such, in this paper, we propose s-XML by adopting the Persistent Labeling scheme as the annotation scheme to ensure seamless integration with relational database and able to support updates without the need to re-construct the existing labels. We conduct experiments to show that s-XML performs better in terms of mapping the XML nodes to relational databases, query retrieval and dynamic update compared to the existing approaches.DOI:http://dx.doi.org/10.11591/ijece.v2i2.21
SCOOTER: A compact and scalable dynamic labeling scheme for XML updates
Although dynamic labeling schemes for XML have been the
focus of recent research activity, there are significant challenges still to be overcome. In particular, though there are labeling schemes that ensure a compact label representation when creating an XML document, when the document is subject to repeated and arbitrary deletions and insertions, the labels grow rapidly and consequently have a significant impact on query and update performance. We review the outstanding issues todate and in this paper we propose SCOOTER - a new dynamic labeling scheme for XML. The new labeling scheme can completely avoid relabeling
existing labels. In particular, SCOOTER can handle frequently skewed insertions gracefully. Theoretical analysis and experimental results confirm the scalability, compact representation, efficient growth rate and performance of SCOOTER in comparison to existing dynamic labeling schemes
XML Labels Compression using Prefix-Encodings
XML is the de-facto standard for data representation and communication over the web, and so there is a lot of interest in querying XML data and most approaches require the data to be labelled to indicate structural relationships between elements. This is simple when the data does not change but complex when it does. In the day-to-day management of XML databases over the web, it is usual that more information is inserted over time than deleted. Frequent insertions can lead to large labels which have a detrimental impact on query performance and can cause overflow problems. Many researchers have shown that prefix encoding usually gives the highest compression ratio in comparison to other encoding schemes. Nonetheless, none of the existing prefix encoding methods has been applied to XML labels. This research investigates compressing XML labels via different prefix-encoding methods in order to reduce the occurrence of any overflow problems and improve query performance. The paper also pre sents a comparison between the performances of several prefix-encodings in terms of encoding/decoding time and compressed code size
FibLSS: A scalable label storage scheme for dynamic XML updates
Dynamic labeling schemes for XML updates have been the focus of significant research activity in recent years. However the label storage schemes underpinning the dynamic labeling schemes have not received as much attention. Label storage schemes specify how labels are physically encoded and stored on disk. The size of the labels and their logical representation directly influence the computational costs of processing the labels and can limit the functionality provided by the dynamic labeling scheme to an XML update service. This has significant practical implications when merging XML repositories such as clinical studies. In this paper, we provide an overview of the existing label storage schemes. We present a novel label storage scheme based on the Fibonacci sequence that can completely avoid relabeling existing nodes under dynamic insertions. Theoretical analysis and experimental results confirm the scalability and performance of the Fibonacci label storage scheme in comparison to existing approaches
Compacting XML Structures Using a Dynamic Labeling Scheme
Abstract. Due to the growing popularity of XML as a data exchange and storage format, the need to develop efficient techniques for stor-ing and querying XML documents has emerged. A common approach to achieve this is to use labeling techniques. However, their main prob-lem is that they either do not support updating XML data dynamically or impose huge storage requirements. On the other hand, with the ver-bosity and redundancy problem of XML, which can lead to increased cost for processing XML documents, compaction of XML documents has be-come an increasingly important research issue. In this paper, we propose an approach called CXDLS combining the strengths of both, labeling and compaction techniques. Our approach exploits repetitive consecu-tive subtrees and tags for compacting the structure of XML documents by taking advantage of the ORDPATH labeling scheme. In addition it stores the compacted structure and the data values separately. Using our proposed approach, it is possible to support efficient query and update processing on compacted XML documents and to reduce storage space dramatically. Results of a comprehensive performance study are provided to show the advantages of CXDLS.
Reusable Prime Number Labeling Scheme for Hierarchical Data Representation in Relational Databases
Hierarchical data structures are important for many computing and information science disciplines including data mining, terrain modeling, and image analysis. There are many specialized hierarchical data management systems, but they are not always available. Alternatively, relational databases are far more common and offer superior reliability, scalability, and performance. However, relational databases cannot natively store and manage hierarchical data. Labeling schemes resolve this issue by labeling all nodes with alphanumeric strings that can be safely stored and retrieved from a database. One such scheme uses prime numbers for its labeling purposes, however the performance and space utilization of this method are not optimal. We propose a more efficient and compact version of this approach
Prime Number-Based Hierarchical Data Labeling Scheme for Relational Databases
Hierarchical data structures are an important aspect of many computer science fields including data mining, terrain modeling, and image analysis. A good representation of such data accurately captures the parent-child and ancestor-descendent relationships between nodes. There exist a number of different ways to capture and manage hierarchical data while preserving such relationships. For instance, one may use a custom system designed for a specific kind of hierarchy. Object oriented databases may also be used to model hierarchical data. Relational database systems, on the other hand, add an additional benefit of mature mathematical theory, reliable implementations, superior functionality and scalability. Relational databases were not originally designed with hierarchical data management in mind. As a result, abstract information can not be natively stored in database relations. Database labeling schemes resolve this issue by labeling all nodes in a way that reveals their relationships. Labels usually encode the node's position in a hierarchy as a number or a string that can be stored, indexed, searched, and retrieved from a database. Many different labeling schemes have been developed in the past. All of them may be classified into three broad categories: recursive expansion, materialized path, and nested sets. Each model has its strengths and weaknesses. Each model implementation attempts to reduce the number of weaknesses inherent to the respective model. One of the most prominent implementations of the materialized path model uses the unique characteristics of prime numbers for its labeling purposes. However, the performance and space utilization of this prime number labeling scheme could be significantly improved. This research introduces a new scheme called reusable prime number labeling (rPNL) that reduces the effects of the mentioned weaknesses. The proposed scheme advantage is discussed in detail, proven mathematically, and experimentally confirmed
Recommended from our members
Pentagonal scheme for dynamic XML prefix labelling
In XML databases, the indexing process is based on a labelling or
numbering scheme and generally used to label an XML document to
perform an XML query using the path node information. Moreover, a
labelling scheme helps to capture the structural relationships during the
processing of queries without the need to access the physical document.
Two of the main problems for labelling XML schemes are duplicated
labels and the cost efficiency of labelling time and size. This research
presents a novel dynamic XML labelling scheme, called the Pentagonal
labelling scheme, in which data are represented as ordered XML nodes
with relationships between them. The update of these nodes from large scale XML documents has been widely investigated and represents a
challenging research problem as it means relabelling a whole tree. Our
algorithms provide an efficient dynamic XML labelling scheme that
supports data updates without duplicating labels or relabelling old nodes.
Our work evaluates the labelling process in terms of size and time, and
evaluates the labelling scheme’s ability to handle several insertions in
XML documents. The findings indicate that the Pentagonal scheme
shows a better initial labelling time performance than the compared
schemes, particularly when using large XML datasets. Moreover, it
efficiently supports random skewed updates, has fast calculations and
uncomplicated implementations so efficiently handles updates. Also, it
proved its capability in terms of the query performance and in determining
the relationships.Libyan governmen