2 research outputs found
Querying semistructured data with compression in distributed environments
As data management applications grow more complex, they are likely to need efficient distributed query processing. In Distributed Database Systems complete replication consists of maintaining complete copies of the database at each site; this has advantages such as highest locality of reference, highest reliability, availability, and is best for reading. The most promising and dominant data format for data processing and representing on the Internet is the semistructured data form termed XML. XML data has no fixed schema; it evolved and is self describing which results in management difficulties compared to, for example relational data. It is therefore a major challenge for the database community to design query languages and storage methods that can retrieve semistructured data. In this paper, we present a storing and querying scheme for semistructured data views of relational form in distributed environments. The proposed technique stores path dictionary, word dictionary, attribute dictionary, and the complete compressed replication of semistructured data in each distributed site of the DDBS. The presented technique provides query performance improvement due to the compression of semistructured data
Recommended from our members
A Comparative Study of Data Transformations for Efficient XML and JSON Data Compression. An In-Depth Analysis of Data Transformation Techniques, including Tag and Capital Conversions, Character and Word N-Gram Transformations, and Domain-Specific Data Transforms using SMILES Data as a Case Study
XML is a widely used data exchange format. The verbose nature of XML leads to the requirement to efficiently store and process this type of data using compression. Various general-purpose transforms and compression techniques exist that can be used to transform and compress XML data. More compact alternatives to XML data have been developed, namely JSON due to the verbosity of XML data.
Similarly, there is a requirement to efficiently store and process SMILES data used in Chemoinformatics. General-purpose transforms and compressors can be used to compress this type of data to a certain extent, however, these techniques are not specific to SMILES data.
The primary contribution of this research is to provide developers that use XML, JSON or SMILES data, with key knowledge of the best transformation techniques to use with certain types of data, and which compression techniques would provide the best compressed output size and processing times, depending on their requirements.
The main study in this thesis, investigates the extent of which using data transforms prior to data compression can further improve the compression of XML and JSON data. It provides a comparative analysis of applying a variety of data transform and data transform variations, to a number of different types of XML and JSON equivalent datasets of various sizes, and applying different general-purpose compression techniques over the transformed data.
A case study is also conducted, to investigate data transforms prior to compression to improve the compression of data within a data-specific domain.The files of software accompanying this thesis are unable to be presented online with the thesis