Search CORE

5,228 research outputs found

Extracting partition statistics from semistructured data

Author: Gourlay Richard
Japp Robert
Neumüller Mathias
Wilson John N.
Publication venue
Publication date: 01/01/2006
Field of study

The effective grouping, or partitioning, of semistructured data is of fundamental importance when providing support for queries. Partitions allow items within the data set that share common structural properties to be identified efficiently. This allows queries that make use of these properties, such as branching path expressions, to be accelerated. Here, we evaluate the effectiveness of several partitioning techniques by establishing the number of partitions that each scheme can identify over a given data set. In particular, we explore the use of parameterised indexes, based upon the notion of forward and backward bisimilarity, as a means of partitioning semistructured data; demonstrating that even restricted instances of such indexes can be used to identify the majority of relevant partitions in the data

University of Strathclyde Institutional Repository

A model for querying semistructured data through the exploitation of regular sub-structures

Author: Neumüller M.
Wilson J.
Publication venue
Publication date: 01/01/2004
Field of study

Much research has been undertaken in order to speed up the processing of semistructured data in general and XML in particular. Many approaches for storage, compression, indexing and querying exist, e.g. [1, 2]. We do not present yet another such algorithm but a unifying model in which these algorithm can be understood. The key idea behind this research is the assumption, that most practical queries are based on a particular pattern of data that can be deduced from the query and which can then be captured using a regular structure amendable to efficient processing techniques

University of Strathclyde Institutional Repository

Designing a resource-efficient data structure for mobile data systems

Author: Gourlay Richard Scott
Publication venue
Publication date: 01/07/2006
Field of study

Designing data structures for use in mobile devices requires attention on optimising data volumes with associated benefits for data transmission, storage space and battery use. For semi-structured data, tree summarisation techniques can be used to reduce the volume of structured elements while dictionary compression can efficiently deal with value-based predicates. This project seeks to investigate and evaluate an integration of the two approaches. The key strength of this technique is that both structural and value predicates could be resolved within one graph while further allowing for compression of the resulting data structure. As the current trend is towards the requirement for working with larger semi-structured data sets this work would allow for the utilisation of much larger data sets whilst reducing requirements on bandwidth and minimising the memory necessary both for the storage and querying of the data

University of Strathclyde Institutional Repository

Compressed materialised views of semi-structured data

Author: Gourlay Richard
Tripney Brian
Wilson John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Query performance issues over semi-structured data have led to the emergence of materialised XML views as a means of restricting the data structure processed by a query. However preserving the conventional representation of such views remains a significant limiting factor especially in the context of mobile devices where processing power, memory usage and bandwidth are significant factors. To explore the concept of a compressed materialised view, we extend our earlier work on structural XML compression to produce a combination of structural summarisation and data compression techniques. These techniques provide a basis for efficiently dealing with both structural queries and valuebased predicates. We evaluate the effectiveness of such a scheme, presenting results and performance measures that show advantages of using such structures

Crossref

University of Strathclyde Institutional Repository

Enlighten

Investigation into Indexing XML Data Techniques

Author: Joan Lu
Klaib Alhadi
Publication venue
Publication date: 21/07/2014
Field of study

The rapid development of XML technology improves the WWW, since the XML data has many advantages and has become a common technology for transferring data cross the internet. Therefore, the objective of this research is to investigate and study the XML indexing techniques in terms of their structures. The main goal of this investigation is to identify the main limitations of these techniques and any other open issues. Furthermore, this research considers most common XML indexing techniques and performs a comparison between them. Subsequently, this work makes an argument to find out these limitations. To conclude, the main problem of all the XML indexing techniques is the trade-off between the size and the efficiency of the indexes. So, all the indexes become large in order to perform well, and none of them is suitable for all users’ requirements. However, each one of these techniques has some advantages in somehow

University of Huddersfield Repository

Content-Aware DataGuides for Indexing Large Collections of XML Documents

Author: Bry François
Meuss Holger
Schulz Klaus U.
Weigel Felix
Publication venue
Publication date: 01/01/2003
Field of study

XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents

CiteSeerX

Open Access LMU