Search CORE

3,118 research outputs found

Boosting XML Filtering with a Scalable FPGA-based Architecture

Author: Bakalov Petko
Mitra Abhishek
Najjar Walid
Tsotras Vassilis
Vieira Marcos
Publication venue
Publication date: 01/01/2009
Field of study

The growing amount of XML encoded data exchanged over the Internet increases the importance of XML based publish-subscribe (pub-sub) and content based routing systems. The input in such systems typically consists of a stream of XML documents and a set of user subscriptions expressed as XML queries. The pub-sub system then filters the published documents and passes them to the subscribers. Pub-sub systems are characterized by very high input ratios, therefore the processing time is critical. In this paper we propose a "pure hardware" based solution, which utilizes XPath query blocks on FPGA to solve the filtering problem. By utilizing the high throughput that an FPGA provides for parallel processing, our approach achieves drastically better throughput than the existing software or mixed (hardware/software) architectures. The XPath queries (subscriptions) are translated to regular expressions which are then mapped to FPGA devices. By introducing stacks within the FPGA we are able to express and process a wide range of path queries very efficiently, on a scalable environment. Moreover, the fact that the parser and the filter processing are performed on the same FPGA chip, eliminates expensive communication costs (that a multi-core system would need) thus enabling very fast and efficient pipelining. Our experimental evaluation reveals more than one order of magnitude improvement compared to traditional pub/sub systems.Comment: CIDR 200

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

XML Document Parsing: Operational and Performance Characteristics

Author: J.J. Ding
Jyh-Charn Liu
Tak Lam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Intelligent XML Tag Classification Techniques for XML Encryption Improvement

Author: Ammari Faisal
Joan Lu
Maher Abur-rous
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2011
Field of study

Flexibility, friendliness, and adaptability have been key components to use XML to exchange information across different networks providing the needed common syntax for various messaging systems. However excess usage of XML as a communication medium shed the light on security standards used to protect exchanged messages achieving data confidentiality and privacy. This research presents a novel approach to secure XML messages being used in various systems with efficiency providing high security measures and high performance. system model is based on two major modules, the first to classify XML messages and define which parts of the messages to be secured assigning an importance level for each tag presented in XML message and then using XML encryption standard proposed earlier by W3C [3] to perform a partial encryption on selected parts defined in classification stage. As a result, study aims to improve both the performance of XML encryption process and bulk message handling to achieve data cleansing efficiently

Crossref

University of Huddersfield Repository

Huddersfield Research Portal

SheetReader: Efficient Specialized Spreadsheet Parsing

Author: Gavriilidis Haralampos
Henze Felix
Markl Volker
Zacharatou Eleni Tzirita
Publication venue: 'Elsevier BV'
Publication date: 01/05/2023
Field of study

The IT University of Copenhagen's Repository

Huddl: the Hydrographic Universal Data Description Language

Author: Calder Brian R.
Masetti Giuseppe
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/05/2015
Field of study

Since many of the attempts to introduce a universal hydrographic data format have failed or have been only partially successful, a different approach is proposed. Our solution is the Hydrographic Universal Data Description Language (HUDDL), a descriptive XML-based language that permits the creation of a standardized description of (past, present, and future) data formats, and allows for applications like HUDDLER, a compiler that automatically creates drivers for data access and manipulation. HUDDL also represents a powerful solution for archiving data along with their structural description, as well as for cataloguing existing format specifications and their version control. HUDDL is intended to be an open, community-led initiative to simplify the issues involved in hydrographic data access

University of New Brunswick: Centre for Digital Scholarship Journals

UNH Scholars' Repository

GPU-based JSON data processing using structural indexes

Author: Vlaswinkel Koen R.
Publication venue
Publication date: 05/08/2021
Field of study

Pure OAI Repository

Scalable structural index construction for json analytics

Author: Jiang Lin
Qiu Junqiao
Zhao Zhijia
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/12/2020
Field of study

JavaScript Object Notation ( JSON) and its variants have gained great popularity in recent years. Unfortunately, the performance of their analytics is often dragged down by the expensive JSON parsing. To address this, recent work has shown that building bitwise indices on JSON data, called structural indices, can greatly accelerate querying. Despite its promise, the existing structural index construction does not scale well as records become larger and more complex, due to its (inherently) sequential construction process and the involvement of costly memory copies that grow as the nesting level increases. To address the above issues, this work introduces Pison – a more memory-efficient structural index constructor with supports of intra-record parallelism. First, Pison features a redesign of the bottleneck step in the existing solution. The new design is not only simpler but more memory-efficient. More importantly, Pison is able to build structural indices for a single bulky record in parallel, enabled by a group of customized parallelization techniques. Finally, Pison is also optimized for better data locality, which is especially critical in the scenario of bulky record processing. Our evaluation using real-world JSON datasets shows that Pison achieves 9.8X speedup (on average) over the existing structural index construction solution for bulky records and 4.6X speedup (on average) of end-to-end performance (indexing plus querying) over a state-of-the-art SIMD-based JSON parser on a 16-core machine

Michigan Technological University

HUDDL for description and archive of hydrographic binary data

Author: Calder Brian R.
Masetti Giuseppe
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/04/2014
Field of study

Many of the attempts to introduce a universal hydrographic binary data format have failed or have been only partially successful. In essence, this is because such formats either have to simplify the data to such an extent that they only support the lowest common subset of all the formats covered, or they attempt to be a superset of all formats and quickly become cumbersome. Neither choice works well in practice. This paper presents a different approach: a standardized description of (past, present, and future) data formats using the Hydrographic Universal Data Description Language (HUDDL), a descriptive language implemented using the Extensible Markup Language (XML). That is, XML is used to provide a structural and physical description of a data format, rather than the content of a particular file. Done correctly, this opens the possibility of automatically generating both multi-language data parsers and documentation for format specification based on their HUDDL descriptions, as well as providing easy version control of them. This solution also provides a powerful approach for archiving a structural description of data along with the data, so that binary data will be easy to access in the future. Intending to provide a relatively low-effort solution to index the wide range of existing formats, we suggest the creation of a catalogue of format descriptions, each of them capturing the logical and physical specifications for a given data format (with its subsequent upgrades). A C/C++ parser code generator is used as an example prototype of one of the possible advantages of the adoption of such a hydrographic data format catalogue

UNH Scholars' Repository

Generic XML-based Framework for Metadata Portals

Author: Diepenbroek Michael
Schindler Uwe
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

Electronic Publication Information Center