Search CORE

107 research outputs found

Stackless Processing of Streamed Trees

Author: Barloy Corentin
Murlak Filip
Paperman Charles
Publication venue: HAL CCSD
Publication date: 20/06/2021
Field of study

International audienceProcessing tree-structured data in the streaming model is a challenge: capturing regular properties of streamed trees by means of a stack is costly in memory, but falling back to finite-state automata drastically limits the computational power. We propose an intermediate stackless model based on register automata equipped with a single counter, used to maintain the current depth in the tree. We explore the power of this model to validate and query streamed trees. Our main result is an effective characterization of regular path queries (RPQs) that can be evaluated stacklessly-with and without registers. In particular, we confirm the conjectured characterization of tree languages defined by DTDs that are recognizable without registers, by Segoufin and Vianu (2002), in the special case of tree languages defined by means of an RPQ

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Streamability of nested word transductions

Author: Filiot Emmanuel
Gauwin Olivier
Reynier Pierre-Alain
Servais Frédéric
Publication venue
Publication date: 01/01/2019
Field of study

We consider the problem of evaluating in streaming (i.e., in a single left-to-right pass) a nested word transduction with a limited amount of memory. A transduction T is said to be height bounded memory (HBM) if it can be evaluated with a memory that depends only on the size of T and on the height of the input word. We show that it is decidable in coNPTime for a nested word transduction defined by a visibly pushdown transducer (VPT), if it is HBM. In this case, the required amount of memory may depend exponentially on the height of the word. We exhibit a sufficient, decidable condition for a VPT to be evaluated with a memory that depends quadratically on the height of the word. This condition defines a class of transductions that strictly contains all determinizable VPTs

arXiv.org e-Print Archive

HAL AMU

Episciences.org

DI-fusion

Schemas for Unordered XML on a DIME

Author: A Brüggemann-Klein
AJ Mayer
D Colazzo
D Colazzo
DC Oppen
G Miklau
Iovka Boneva
J Albert
L Cardelli
Radu Ciucanu
S Amer-Yahia
S Grijzenhout
Sławek Staworko
W Gelade
W Martens
W Martens
W Martens
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We investigate schema languages for unordered XML having no relative order among siblings. First, we propose unordered regular expressions (UREs), essentially regular expressions with unordered concatenation instead of standard concatenation, that define languages of unordered words to model the allowed content of a node (i.e., collections of the labels of children). However, unrestricted UREs are computationally too expensive as we show the intractability of two fundamental decision problems for UREs: membership of an unordered word to the language of a URE and containment of two UREs. Consequently, we propose a practical and tractable restriction of UREs, disjunctive interval multiplicity expressions (DIMEs). Next, we employ DIMEs to define languages of unordered trees and propose two schema languages: disjunctive interval multiplicity schema (DIMS), and its restriction, disjunction-free interval multiplicity schema (IMS). We study the complexity of the following static analysis problems: schema satisfiability, membership of a tree to the language of a schema, schema containment, as well as twig query satisfiability, implication, and containment in the presence of schema. Finally, we study the expressive power of the proposed schema languages and compare them with yardstick languages of unordered trees (FO, MSO, and Presburger constraints) and DTDs under commutative closure. Our results show that the proposed schema languages are capable of expressing many practical languages of unordered trees and enjoy desirable computational properties.Comment: Theory of Computing System

arXiv.org e-Print Archive

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Edinburgh Research Explorer

Hal-Diderot

An XML Query Engine for Network-Bound Data

Author: Halevy Alon Y
Ives Zachary G
Weld Daniel S
Publication venue: ScholarlyCommons
Publication date: 01/01/2001
Field of study

XML has become the lingua franca for data exchange and integration across administrative and enterprise boundaries. Nearly all data providers are adding XML import or export capabilities, and standard XML Schemas and DTDs are being promoted for all types of data sharing. The ubiquity of XML has removed one of the major obstacles to integrating data from widely disparate sources –- namely, the heterogeneity of data formats. However, general-purpose integration of data across the wide area also requires a query processor that can query data sources on demand, receive streamed XML data from them, and combine and restructure the data into new XML output -- while providing good performance for both batch-oriented and ad-hoc, interactive queries. This is the goal of the Tukwila data integration system, the first system that focuses on network-bound, dynamic XML data sources. In contrast to previous approaches, which must read, parse, and often store entire XML objects before querying them, Tukwila can return query results even as the data is streaming into the system. Tukwila is built with a new system architecture that extends adaptive query processing and relational-engine techniques into the XML realm, as facilitated by a pair of operators that incrementally evaluate a query’s input path expressions as data is read. In this paper, we describe the Tukwila architecture and its novel aspects, and we experimentally demonstrate that Tukwila provides better overall query performance and faster initial answers than existing systems, and has excellent scalability

CiteSeerX

ScholarlyCommons@Penn

Simple Schemas for Unordered XML

Author: Boneva Iovka
Ciucanu Radu
Staworko Slawek
Publication venue
Publication date: 01/01/2013
Field of study

We consider unordered XML, where the relative order among siblings is ignored, and propose two simple yet practical schema formalisms: disjunctive multiplicity schemas (DMS), and its restriction, disjunction-free multiplicity schemas (MS). We investigate their computational properties and characterize the complexity of the following static analysis problems: schema satisfiability, membership of a tree to the language of a schema, schema containment, twig query satisfiability, implication, and containment in the presence of schema. Our research indicates that the proposed formalisms retain much of the expressiveness of DTDs without an increase in computational complexity. 1

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Efficient main memory-based XML stream processing

Author: Scherzinger Stefanie
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2008
Field of study

Applications that process XML documents as files or streams are naturally main-memory based. This makes main memory the bottleneck for scalability. This doctoral thesis addresses this problem and presents a toolkit for effective buffer management in main memory-based XML stream processors. XML document projection is an established technique for reducing the buffer requirements of main memory-based XML processors, where only data relevant to query evaluation is loaded into main memory buffers. We present a novel implementation of this task, where we use string matching algorithms designed for efficient keyword search in flat strings to navigate in tree-structured data. We then introduce an extension of the XQuery language, called FluX, that supports event-based query processing. Purely event-based queries of this language can be executed on streaming XML data in a very direct way. We develop an algorithm to efficiently rewrite XQueries into FluX. This algorithm is capable of exploiting order constraints derived from schemata to reduce the amount of buffering in query evaluation. During streaming query evaluation, we continuously purge buffers from data that is no longer relevant. By combining static query analysis with a dynamic analysis of the buffer contents, we effectively reduce the size of memory buffers. We have confirmed the efficacy of these techniques by extensive experiments and by publication at international venues. To compare our contributions to related work in a systematic manner, we contribute an abstract framework for XML stream processing. This framework allows us to gain a greater-picture view over the factors influencing the main memory consumption.Anwendungen, die XML-Dokumente als Dateien oder Ströme verarbeiten, sind natürlicherweise hauptspeicherbasiert. Für die Skalierbarkeit wird der Hauptspeicher damit zu einem Engpass. Diese Doktorarbeit widmet sich diesem Problem, zu dessen Lösung sie Werkzeuge für eine effektive Pufferverwaltung in hauptspeicherbasierten Prozessoren für XML-Datenströme vorstellt. Die Projektion von XML-Dokumenten ist eine etablierte Methode, um den Pufferverbrauch von hauptspeicherbasierten XML-Prozessoren zu reduzieren. Dabei werden nur jene Daten in den Hauptspeicherpuffer geladen, die für die Anfrageauswertung auch relevant sind. Wir präsentieren eine neue Implementierung dieser Aufgabe, wobei wir Algorithmen zur effizienten Suche in flachen Zeichenketten einsetzen, um in baumartig strukturierten Daten zu navigieren. Danach stellen wir eine Erweiterung der XQuery-Sprache vor, genannt FluX, welche eine ereignisbasierte Anfragebearbeitung erlaubt. Anfragen, die nur ereignisbasierte Konstrukte benutzen, können direkt über XML-Datenströmen ausgewertet werden. Dazu entwickeln wir einen Algorithmus, mit dessen Hilfe sich XQuery-Anfragen effizient in FluX übersetzen lassen. Dieser benutzt Ordnungsinformationen aus Datenschemata, womit das Puffern in der Anfragebearbeitung reduziert werden kann. Während der Verarbeitung des Datenstroms bereinigen wir laufend den Hauptspeicherpuffer von solchen Daten, die nicht länger relevant sind. Eine nachhaltige Reduzierung der Größe von Hauptspeicherpuffern gelingt durch die Kombination der statischen Anfrageanalyse mit einer dynamischen Analyse der Pufferinhalte. Die Effektivität dieser Puffermanagement-Techniken erfährt ihre Bestätigung in umfangreichen Experimenten und internationalen Publikationen. Für einen systematischen Vergleich unserer Beiträge mit der aktuellen Literatur entwickeln wir ein abstraktes System zur Modellierung von Prozessoren zur XML-Stromverarbeitung. So können wir die spezifischen Faktoren herausgreifen, die den Hauptspeicherverbrauch beeinflussen

Universaar

Acronym

Earliest Query Answering for Deterministic Nested Word Automata

Author: A. Berlea
A. Neumann
D. Olteanu
G. Miklau
H. Seidl
L. Segoufin
M. Benedikt
M. Grohe
O. Gauwin
O. Gauwin
R. Alur
W. Martens
Publication venue: 'Nordic Pulp and Paper Research Journal'
Publication date: 01/01/2009
Field of study

International audienceEarliest query answering (EQA) is an objective of many recent streaming algorithms for XML query answering, that aim for close to optimal memory management. In this paper, we show that EQA is infeasible even for a small fragment of Forward XPath except if P=NP. We then present an EQA algorithm for queries and schemas defined by deterministic nested word automata (dNWAs) and distinguish a large class of dNWAs for which streaming query answering is feasible in polynomial space and time

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Recognizing well-parenthesized expressions in the streaming model

Author: Magniez F.
Mathieu C.
Nayak A.
Publication venue
Publication date: 17/11/2009
Field of study

Motivated by a concrete problem and with the goal of understanding the sense in which the complexity of streaming algorithms is related to the complexity of formal languages, we investigate the problem Dyck(s) of checking matching parentheses, with

s

different types of parenthesis. We present a one-pass randomized streaming algorithm for Dyck(2) with space \Order(\sqrt{n}\log n), time per letter \polylog (n), and one-sided error. We prove that this one-pass algorithm is optimal, up to a \polylog n factor, even when two-sided error is allowed. For the lower bound, we prove a direct sum result on hard instances by following the "information cost" approach, but with a few twists. Indeed, we play a subtle game between public and private coins. This mixture between public and private coins results from a balancing act between the direct sum result and a combinatorial lower bound for the base case. Surprisingly, the space requirement shrinks drastically if we have access to the input stream in reverse. We present a two-pass randomized streaming algorithm for Dyck(2) with space \Order((\log n)^2), time \polylog (n) and one-sided error, where the second pass is in the reverse direction. Both algorithms can be extended to Dyck(s) since this problem is reducible to Dyck(2) for a suitable notion of reduction in the streaming model.Comment: 20 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Streaming Property Testing of Visibly Pushdown Languages

Author: de Rougemont Michel
François Nathanaël
Magniez Frédéric
Serre Olivier
Publication venue
Publication date: 03/11/2015
Field of study

In the context of language recognition, we demonstrate the superiority of streaming property testers against streaming algorithms and property testers, when they are not combined. Initiated by Feigenbaum et al., a streaming property tester is a streaming algorithm recognizing a language under the property testing approximation: it must distinguish inputs of the language from those that are

\varepsilon

-far from it, while using the smallest possible memory (rather than limiting its number of input queries). Our main result is a streaming

\varepsilon

-property tester for visibly pushdown languages (VPL) with one-sided error using memory space

\mathrm{poly}((\log n) / \varepsilon)

. This constructions relies on a (non-streaming) property tester for weighted regular languages based on a previous tester by Alon et al. We provide a simple application of this tester for streaming testing special cases of instances of VPL that are already hard for both streaming algorithms and property testers. Our main algorithm is a combination of an original simulation of visibly pushdown automata using a stack with small height but possible items of linear size. In a second step, those items are replaced by small sketches. Those sketches relies on a notion of suffix-sampling we introduce. This sampling is the key idea connecting our streaming tester algorithm to property testers.Comment: 23 pages. Major modifications in the presentatio

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot