Search CORE

84 research outputs found

Efficient asymmetric inclusion of regular expressions with interleaving and counting for XML type-checking

Author: Colazzo Dario
Ghelli Giorgio
Pardini Luca
Sartiani Carlo
Publication venue: ACM
Publication date: 01/01/2013
Field of study

The inclusion of Regular Expressions (REs) is the kernel of any type-checking algorithm for XML manipulation languages. XML applications would benefit from the extension of REs with interleaving and counting, but this is not feasible in general, since inclusion is EXPSPACE-complete for such extended REs. In Colazzo et al. (2009) [1] we introduced a notion of ?conflict-free REs?, which are extended REs with excellent complexity behaviour, including a polynomial inclusion algorithm [1] and linear membership (Ghelli et al., 2008 [2]). Conflict-free REs have interleaving and counting, but the complexity is tamed by the ?conflict-free? limitations, which have been found to be satisfied by the vast majority of the content models published on the Web.However, a type-checking algorithm needs to compare machine-generated subtypes against human-defined supertypes. The conflict-free restriction, while quite harmless for the human-defined supertype, is far too restrictive for the subtype. We show here that the PTIME inclusion algorithm can be actually extended to deal with totally unrestricted REs with counting and interleaving in the subtype position, provided that the supertype is conflict-free.This is exactly the expressive power that we need in order to use subtyping inside type-checking algorithms, and the cost of this generalized algorithm is only quadratic, which is as good as the best algorithm we have for the symmetric case (see [1]). The result is extremely surprising, since we had previously found that symmetric inclusion becomes NP-hard as soon as the candidate subtype is enriched with binary intersection, a generalization that looked much more innocent than what we achieve here

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università della Basilicata

Archivio della Ricerca - Università di Pisa

PUblication MAnagement

Open Access Repository

A Type System for Interactive JSON Schema Inference (Extended Abstract)

Author: Baazizi Mohamed-Amine
Colazzo Dario
Ghelli Giorgio
Sartiani Carlo
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)
Publication date: 01/01/2019
Field of study

In this paper we present the first JSON type system that provides the possibility of inferring a schema by adopting different levels of precision/succinctness for different parts of the dataset, under user control. This feature gives the data analyst the possibility to have detailed schemas for parts of the data of greater interest, while more succinct schema is provided for other parts, and the decision can be changed as many times as needed, in order to explore the schema in a gradual fashion, moving the focus to different parts of the collection, without the need of reprocessing data and by only performing type rewriting operations on the most precise schema

Archivio della Ricerca - Università della Basilicata

Archivio della Ricerca - Università di Pisa

Dagstuhl Research Online Publication Server

Schema Inference for Massive JSON Datasets

Author: Dario Colazzo
Giorgio Ghelli
Houssem Ben Lahmar
Mohamed Amine Baazizi
SARTIANI CARLO
Publication venue: OpenProceedings.org
Publication date: 01/01/2017
Field of study

In the recent years JSON affirmed as a very popular data format for representing massive data collections. JSON data collections are usually schemaless. While this ensures sev- eral advantages, the absence of schema information has im- portant negative consequences: the correctness of complex queries and programs cannot be statically checked, users cannot rely on schema information to quickly figure out the structural properties that could speed up the formulation of correct queries, and many schema-based optimizations are not possible. In this paper we deal with the problem of inferring a schema from massive JSON datasets. We first identify a JSON type language which is simple and, at the same time, expressive enough to capture irregularities and to give com- plete structural information about input data. We then present our main contribution, which is the design of a schema inference algorithm, its theoretical study, and its implemen- tation based on Spark, enabling reasonable schema infer- ence time for massive collections. Finally, we report about an experimental analysis showing the effectiveness of our ap- proach in terms of execution time, precision, and conciseness of inferred schemas, and scalability

Archivio della Ricerca - Università della Basilicata

Archivio della Ricerca - Università di Pisa

Preface of the 31st Italian Symposium on Advanced Database Systems

Author: Amato Flora
Atzori Maurizio
Baralis Elena
Bartolini Ilaria
Bellomarini Luigi
Buccafurri Francesco
Cabibbo Luca
Calvanese Diego
Calí Andrea
Camporese Antonio
Caruccio Loredana
Castano Silvana
Catania Barbara
Ceci Michelangelo
Chiusano Silvia
Ciaccia Paolo
Corradini Enrico
Crescenzi Valter
De Antonellis Valeria
Di Noia Tommaso
Diamantini Claudia
Faggioli Guglielmo
Fazzinga Bettina
Ferrara Alfio
Ferrari Elena
Ferro Nicola
Firmani Donatella
Garza Paolo
Giachelle Fabio
Golfarelli Matteo
Greco Sergio
Guerrini Giovanna
Gullo Francesco
Guzzi Pietro Hiram
Irrera Ornella
Lanti Davide
Lembo Domenico
Leoncini Debora
Leotta Francesco
Manco Giuseppe
Mandreoli Federica
Marchesin Stefano
Masciari Elio
Maurino Andrea
Melchiori Michele
Menotti Laura
Mircoli Alex
Missier Paolo
Molinaro Cristian
Montanelli Stefano
Moscato Vincenzo
Papotti Paolo
Pasin Andrea
Pensa Ruggero G.
Piantella Davide
Pugliese Andrea
Quaggio Elisa
Quintarelli Elisa
Renso Chiara
Rinzivillo Salvatore
Sartiani Carlo
Savo Domenico Fabio
Silvello Gianmaria
Simonini Giovanni
Storti Emanuele
Tagarelli Andrea
Tanca Letizia
Publication venue
Publication date: 10/09/2023
Field of study

This volume contains the proceedings of the 31st Italian Symposium on Advanced Database Systems (SEBD - Sistemi Evoluti per Basi di Dati), held in Galzinagno Terme (Padua, Italy) from 2 to 5 July 2023.</p

University of Birmingham Research Portal

Preface of the 31st Italian Symposium on Advanced Database Systems

Author: Amato Flora
Atzori Maurizio
Baralis Elena
Bartolini Ilaria
Bellomarini Luigi
Buccafurri Francesco
Cabibbo Luca
Calvanese Diego
Calí Andrea
Camporese Antonio
Caruccio Loredana
Castano Silvana
Catania Barbara
Ceci Michelangelo
Chiusano Silvia
Ciaccia Paolo
Corradini Enrico
Crescenzi Valter
De Antonellis Valeria
Di Noia Tommaso
Diamantini Claudia
Faggioli Guglielmo
Fazzinga Bettina
Ferrara Alfio
Ferrari Elena
Ferro Nicola
Firmani Donatella
Garza Paolo
Giachelle Fabio
Golfarelli Matteo
Greco Sergio
Guerrini Giovanna
Gullo Francesco
Guzzi Pietro Hiram
Irrera Ornella
Lanti Davide
Lembo Domenico
Leoncini Debora
Leotta Francesco
Manco Giuseppe
Mandreoli Federica
Marchesin Stefano
Masciari Elio
Maurino Andrea
Melchiori Michele
Menotti Laura
Mircoli Alex
Missier Paolo
Molinaro Cristian
Montanelli Stefano
Moscato Vincenzo
Papotti Paolo
Pasin Andrea
Pensa Ruggero G.
Piantella Davide
Pugliese Andrea
Quaggio Elisa
Quintarelli Elisa
Renso Chiara
Rinzivillo Salvatore
Sartiani Carlo
Savo Domenico Fabio
Silvello Gianmaria
Simonini Giovanni
Storti Emanuele
Tagarelli Andrea
Tanca Letizia
Publication venue
Publication date: 10/09/2023
Field of study

University of Birmingham Research Portal

Efficient inclusion for a class of XML types with interleaving and counting

Author: Colazzo Dario
Ghelli Giorgio
Sartiani Carlo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

SUMMARY: Inclusion between XML types is important but expensive, and is much more expensive when unordered types are considered. We prove here that inclusion for XML types with interleaving and counting can be decided in polynomial time in the presence of two important restrictions: no element appears twice in the same content model, and Kleene star is only applied to disjunctions of single elements. Our approach is based on the transformation of each such content model into a set of constraints that completely characterizes the generated language. We then reduce inclusion checking to constraint implication. We exhibit a quadratic algorithm to perform inclusion checking on a RAM machine

HAL-CentraleSupelec

CiteSeerX

Archivio della Ricerca - Università della Basilicata

Archivio della Ricerca - Università di Pisa

UnipiEprints

HAL-Rennes 1

Evaluating Nested Queries on XML Data

Author: SARTIANI CARLO
Publication venue: country:USA
Publication date: 01/01/2003
Field of study

Archivio della Ricerca - Università della Basilicata

Efficient Subtyping for Unordered XML Types

Author: Carlo Sartiani
Carlo Sartiani
Dario Colazzo
Dario Colazzo
Publication venue
Publication date: 01/01/2007
Field of study

While XML is an ordered data format, many applications outside the document processing area just drop ordering and manipulate XML data as they were unordered. In these contexts, hence, XML is essentially used as a way for representing unordered, unranked trees. The wide use of unordered XML data should be coupled with a careful and detailed analysis of their theoretical properties. One of the operations that is mostly affected by the presence of a global ordering relation is semantic subtype-checking, i.e., language inclusion. In an unordered context, inclusion has been proved to be inherently more complex than in the ordered case: in particular, subtype-checking for ordered single-type EDTDs is in PSPACE, while the same operation for single-type EDTDs with unordered types is in EXPSPACE (the same complexity result holds for unordered DTDs). Comparing two unordered XML types for inclusion, hence, is very expensive; as a consequence, it becomes very important to identify restrictions defining type classes for which inclusion is tractable or, at least, less complex. This paper identifies two large subclasses of unordered XML types for which inclusion can be computed by an EXPTIME and a PTIME algorithm, respectively. These classes are defined by restrictions on the use of element, repetition, and union types, and comprise many DTDs and XML Schemas used in practice.

CiteSeerX

UnipiEprints

On the Correctness of Query Results in XML P2P Databases (Extended Abstract)

Author: Carlo Sartiani
Publication venue
Publication date: 02/04/2008
Field of study

(This work has been funded by the GRID.IT Project and by Microsoft Corporation) for

p in input()//article

a in

b/author,

t in $b/title return <author-title&gt

CiteSeerX

A framework for estimating xml query cardinality

Author: Carlo Sartiani
Publication venue
Publication date: 01/01/2003
Field of study

Abstract. In the context of XML data management systems, the estimation of query cardinality is becoming more and more important: the information provided by a query result estimator can be used as input to the query optimizer, as an early feedback to user queries, as well as input for determining an optimal storage schema, and it may be helpful in embedded query execution. Existing estimation models for XML queries focus on particular aspects of XML querying, such as the estimation of path and twig expression cardinality, and they do not deal with the problem of predicting the cardinality of general XQuery queries. This paper presents a framework for estimating XML query cardinality. The framework provides facilities for estimating result size of FLWR queries, hence allowing the model designer to concentrate her efforts on the development of adequate and accurate, while concise, statistic summaries for XML data. The framework can also be used for extending existing models to a wider class of XML queries.

CiteSeerX

Archivio della Ricerca - Università della Basilicata