Search CORE

424 research outputs found

Parsing XML Using Parallel Traversal of Streaming Trees

Author: A. Reinefeld
D. Brownell
M.R. Head
T. Takase
V.N. Rao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Abstract. XML has been widely adopted across a wide spectrum of applica-tions. Its parsing efficiency, however, remains a concern, and can be a bottleneck. With the current trend towards multicore CPUs, parallelization to improve per-formance is increasingly relevant. In many applications, the XML is streamed from the network, and thus the complete XML document is never in memory at any single moment in time. Parallel parsing of such a stream can be equated to parallel depth-first traversal of a streaming tree. Existing research on parallel tree traversal has assumed the entire tree was available in-memory, and thus cannot be directly applied. In this paper we investigate parallel, SAX-style parsing of XML via a parallel, depth-first traversal of the streaming document. We show good scalability up to about 6 cores on a Linux platform.

CiteSeerX

Crossref

TypEx : a type based approach to XML stream querying

Author: Connor Richard
Neumüller Mathias
Russell George
Publication venue: WebDB
Publication date: 01/01/2003
Field of study

We consider the topic of query evaluation over semistructured information streams, and XML data streams in particular. Streaming evaluation methods are necessarily eventdriven, which is in tension with high-level query models; in general, the more expressive the query language, the harder it is to translate queries into an event-based implementation with finite resource bounds

CiteSeerX

University of Strathclyde Institutional Repository

An XML Query Engine for Network-Bound Data

Author: Halevy Alon Y
Ives Zachary G
Weld Daniel S
Publication venue: ScholarlyCommons
Publication date: 01/01/2001
Field of study

XML has become the lingua franca for data exchange and integration across administrative and enterprise boundaries. Nearly all data providers are adding XML import or export capabilities, and standard XML Schemas and DTDs are being promoted for all types of data sharing. The ubiquity of XML has removed one of the major obstacles to integrating data from widely disparate sources –- namely, the heterogeneity of data formats. However, general-purpose integration of data across the wide area also requires a query processor that can query data sources on demand, receive streamed XML data from them, and combine and restructure the data into new XML output -- while providing good performance for both batch-oriented and ad-hoc, interactive queries. This is the goal of the Tukwila data integration system, the first system that focuses on network-bound, dynamic XML data sources. In contrast to previous approaches, which must read, parse, and often store entire XML objects before querying them, Tukwila can return query results even as the data is streaming into the system. Tukwila is built with a new system architecture that extends adaptive query processing and relational-engine techniques into the XML realm, as facilitated by a pair of operators that incrementally evaluate a query’s input path expressions as data is read. In this paper, we describe the Tukwila architecture and its novel aspects, and we experimentally demonstrate that Tukwila provides better overall query performance and faster initial answers than existing systems, and has excellent scalability

CiteSeerX

ScholarlyCommons@Penn

Boosting XML Filtering with a Scalable FPGA-based Architecture

Author: Bakalov Petko
Mitra Abhishek
Najjar Walid
Tsotras Vassilis
Vieira Marcos
Publication venue
Publication date: 01/01/2009
Field of study

The growing amount of XML encoded data exchanged over the Internet increases the importance of XML based publish-subscribe (pub-sub) and content based routing systems. The input in such systems typically consists of a stream of XML documents and a set of user subscriptions expressed as XML queries. The pub-sub system then filters the published documents and passes them to the subscribers. Pub-sub systems are characterized by very high input ratios, therefore the processing time is critical. In this paper we propose a "pure hardware" based solution, which utilizes XPath query blocks on FPGA to solve the filtering problem. By utilizing the high throughput that an FPGA provides for parallel processing, our approach achieves drastically better throughput than the existing software or mixed (hardware/software) architectures. The XPath queries (subscriptions) are translated to regular expressions which are then mapped to FPGA devices. By introducing stacks within the FPGA we are able to express and process a wide range of path queries very efficiently, on a scalable environment. Moreover, the fact that the parser and the filter processing are performed on the same FPGA chip, eliminates expensive communication costs (that a multi-core system would need) thus enabling very fast and efficient pipelining. Our experimental evaluation reveals more than one order of magnitude improvement compared to traditional pub/sub systems.Comment: CIDR 200

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

GPU-based JSON data processing using structural indexes

Author: Vlaswinkel Koen R.
Publication venue
Publication date: 05/08/2021
Field of study

Pure OAI Repository

STANSE: Bug-finding Framework for C Programs

Author: Obdržálek Jan
Slabý Jiří
Trtík Marek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Regular paper accepted at the MEMICS 2011 workshop. The paper deals with static analysis. It also describes a framework and tool called Stanse

CiteSeerX

Crossref

Univerzitní repozitář Masarykovy univerzity

Scalable structural index construction for json analytics

Author: Jiang Lin
Qiu Junqiao
Zhao Zhijia
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/12/2020
Field of study

JavaScript Object Notation ( JSON) and its variants have gained great popularity in recent years. Unfortunately, the performance of their analytics is often dragged down by the expensive JSON parsing. To address this, recent work has shown that building bitwise indices on JSON data, called structural indices, can greatly accelerate querying. Despite its promise, the existing structural index construction does not scale well as records become larger and more complex, due to its (inherently) sequential construction process and the involvement of costly memory copies that grow as the nesting level increases. To address the above issues, this work introduces Pison – a more memory-efficient structural index constructor with supports of intra-record parallelism. First, Pison features a redesign of the bottleneck step in the existing solution. The new design is not only simpler but more memory-efficient. More importantly, Pison is able to build structural indices for a single bulky record in parallel, enabled by a group of customized parallelization techniques. Finally, Pison is also optimized for better data locality, which is especially critical in the scenario of bulky record processing. Our evaluation using real-world JSON datasets shows that Pison achieves 9.8X speedup (on average) over the existing structural index construction solution for bulky records and 4.6X speedup (on average) of end-to-end performance (indexing plus querying) over a state-of-the-art SIMD-based JSON parser on a 16-core machine

Michigan Technological University

Efficient Processing of Expressive Node-Selecting Queries on XML Data in Secondary Storage: A Tree Automata-based Approach

Author: Koch Christoph
Publication venue
Publication date: 14/06/2011
Field of study

Infoscience - École polytechnique fédérale de Lausanne