Search CORE

1,459 research outputs found

Memory-Efficient Query Processing over XML Fragment Stream with Fragment Labeling

Author: Kang Hyunchul
Kim Jin
Lee Sangwook
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

The portable/hand-held devices deployed in mobile computing environment are mostly limited in memory. To make it possible for them to locally process queries over a large volume of XML data, the data needs to be streamed in fragments of manageable size and the queries need to be processed over the stream with as little memory as possible. In this paper, we report a considerable improvement of the state-of-the-art techniques of query processing over XML fragment stream in memory efficiency. We use XML fragment labeling (XFL) as a method of representing XML fragmentation, and show that XFL is much more effective than the popular hole-filler (HF) model employed in the state-of-the-art in reducing the amount of memory required for query processing. The state-of-the-art with the HF model requires more memory as the stream size increases. With XFL, we overcome this fundamental limitation, proposing the techniques to make query processing scalable in the sense that memory requirement is not affected by the size of the stream as long as the stream is bounded. The improvement is verified through implementation and a detailed set of experiments

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Compressed materialised views of semi-structured data

Author: Gourlay Richard
Tripney Brian
Wilson John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Query performance issues over semi-structured data have led to the emergence of materialised XML views as a means of restricting the data structure processed by a query. However preserving the conventional representation of such views remains a significant limiting factor especially in the context of mobile devices where processing power, memory usage and bandwidth are significant factors. To explore the concept of a compressed materialised view, we extend our earlier work on structural XML compression to produce a combination of structural summarisation and data compression techniques. These techniques provide a basis for efficiently dealing with both structural queries and valuebased predicates. We evaluate the effectiveness of such a scheme, presenting results and performance measures that show advantages of using such structures

Crossref

University of Strathclyde Institutional Repository

Enlighten

A Method of XML Document Fragmentation for Reducing Time of XML Fragment Stream Query Processing

Author: Kang Hyunchul
Kim Jin
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 10/08/2012
Field of study

As XML has been established as the standard for data exchange not just on the Web but among heterogeneous devices, systems, and applications, effective processing of XML queries is one of core components of ubiquitous computing. Most of the mobile/hand-held devices deployed in ubiquitous computing environment are still limited in memory and processing power. An effective query processing is required when the source XML document is of large volume. The framework of fragmenting an XML document and streaming the XML fragments for query processing at the mobile devices has received much attention. However, the main focus was on the memory efficiency to cope with the memory constraint in the mobile devices. Query processing time might be compromised in those techniques. Since the processing power is also limited in the mobile devices, the time optimization deserves attention. We have found out that the query processing time is significantly affected by how the source XML document is fragmented. In this paper, we propose a method of XML document fragmentation whereby query processing gets efficient in time while the size constraint for each resulting fragment is satisfied. Through implementation and a set of detailed experiments, we show that our proposed method considerably outperforms other methods

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

The XQueC Project: Compressing and Querying XML

Author: Arion Andrei
Bonifati Angela
Manolescu Ioana
Pugliese Andrea
Publication venue: Dagstuhl Seminar Proceedings. 08261 - Structure-Based Compression of Complex Massive Data
Publication date: 01/01/2008
Field of study

Dagstuhl Research Online Publication Server

Online Integration of Semistructured Data

Author: Handoko
Publication venue: School of Computing and Information Technology
Publication date: 01/01/2017
Field of study

Data integration systems play an important role in the development of distributed multi-database systems. Data integration collects data from heterogeneous and distributed sources, and provides a global view of data to the users. Systems need to process user\u27s applications in the shortest possible time. The virtualization approach to data integration systems ensures that the answers to user requests are the most up-to-date ones. In contrast, the materialization approach reduces data transmission time at the expense of data consistency between the central and remote sites. The virtualization approach to data integration systems can be applied in either batch or online mode. Batch processing requires all data to be available at a central site before processing is started. Delays in transmission of data over a network contribute to a longer processing time. On the other hand, in an online processing mode data integration is performed piece-by-piece as soon as a unit of data is available at the central site. An online processing mode presents the partial results to the users earlier. Due to the heterogeneity of data models at the remote sites, a semistructured global view of data is required. The performance of data integration systems depends on an appropriate data model and the appropriate data integration algorithms used. This thesis presents a new algorithm for immediate processing of data collected from remote and autonomous database systems. The algorithm utilizes the idle processing states while the central site waits for completion of data transmission to produce instant partial results. A decomposition strategy included in the algorithm balances of the computations between the central and remote sites to force maximum resource utilization at both sites. The thesis chooses the XML data model for the representation of semistructured data, and presents a new formalization of the XML data model together with a set of algebraic operations. The XML data model is used to provide a virtual global view of semistructured data. The algebraic operators are consistent with operations of relational algebra, such that any existing syntax based query optimization technique developed for the relational model of data can be directly applied. The thesis shows how to optimize online processing by generating one online integration plan for several data increments. Further, the thesis shows how each independent increment expression can be processed in a parallel mode on a multi core processor system. The dynamic scheduling system proposed in the thesis is able to defer or terminate a plan such that materialization updates and unnecessary computations are minimized. The thesis shows that processing data chunks of fragmented XML documents allows for data integration in a shorter period of time. Finally, the thesis provides a clear formalization of the semistructured data model, a set of algorithms with high-level descriptions, and running examples. These formal backgrounds show that the proposed algorithms are implementable

Research Online

Cache-and-query for wide area sensor databases

Author: Amol Deshpande
Phillip B. Gibbons
Srinivasan Seshan
Suman Nath
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Crossref

The XML benchmark project

Author: Carey M.J.
Florescu D.
Kersten M.L. (Martin)
Schmidt A.R.
Waas F.
Publication venue: CWI
Publication date: 01/01/2001
Field of study

With standardization efforts of a query language for XML documents drawing to a close, researchers and users increasingly focus their attention on the database technology that has to deliver on the new challenges that the sheer amount of XML documents produced by applications pose to data management: validation, performance evaluation and optimization of XML query processors are the upcoming issues. Following a long tradition in database research, the XML Store Benchmark Project provides a framework to assess an XML database's abilities to cope with a broad spectrum of different queries, typically posed in real-world application scenarios. The benchmark is intended to help both implementors and users to compare XML databases independent of their own, specific application scenario. To this end, the benchmark offers a set queries each of which is intended to challenge a particular primitive of the query processor or storage engine. The overall workload we propose consists of a scalable document database and a concise, yet comprehensive set of queries, which covers the major aspects of query processing. The queries' challenges range from stressing the textual character of the document to data analysis queries, but include also typical ad-hoc queries. We complement our research with results obtained from running the benchmark on our XML database platform. They are intended to give a first baseline, illustrating the state of the art

CWI's Institutional Repository

Efficient storage of XML data

Author: Kanne Carl-Christian
Moerkotte Guido
Publication venue
Publication date: 01/01/1999
Field of study

We introduce NATIX, an efficient, native repository for storing, retrieving and managing tree-structured large objects, preferably XML documents. In contrast to traditionallarge object (LOB) managers, we do not split at arbitrary byte positions but take the semantics of the underlying tree structure of XML documents into account. Our parameterizable split algorithm dynamically maintains physical records of size smaller than a page which contain sets of connected tree nodes. This not only improves efficiency by clustering subtrees but also facilitates their compact representation. Existing approaches to store XML documents either use flat files or map every single tree node onto a separate physical record. The increased flexibility of our approach results in higher efficiency. Performance measurements validate this claim

CiteSeerX

MAnnheim DOCument Server