7,304 research outputs found
A unified view of data-intensive flows in business intelligence systems : a survey
Data-intensive flows are central processes in todayās business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of todayās research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft
Containment of Pattern-Based Queries over Data Trees
International audienceWe study static analysis, in particular the containment problem, for analogs of conjunctive queries over XML documents. The problem has been studied for queries based on arbitrary patterns, not necessarily following the tree structure of documents. However, many applications force the syntactic shape of queries to be tree-like, as they are based on proper tree patterns. This renders previous results, crucially based on having non-tree-like features, inapplicable. Thus, we investigate static analysis of queries based on proper tree patterns. We go beyond simple navigational conjunctive queries in two ways: we look at unions and Boolean combinations of such queries as well and, crucially, all our queries handle data stored in documents, i.e., we deal with containment over data trees. We start by giving a general \Pi^p_2 upper bound on the containment of conjunctive queries and Boolean combinations for patterns that involve all types of navigation through documents. We then show matching hardness for conjunctive queries with all navigation, or their Boolean combinations with the simplest form of navigation. After that we look at cases when containment can be witnessed by homomorphisms of analogs of tableaux. These include conjunctive queries and their unions over child and next-sibling axes; however, we show that not all cases of containment can be witnessed by homomorphisms. We look at extending tree patterns used in queries in three possible ways: with wildcard, with schema information, and with data value comparisons. The first one is relatively harmless, the second one tends to increase complexity by an exponential, and the last one quickly leads to undecidability
Queries with Guarded Negation (full version)
A well-established and fundamental insight in database theory is that
negation (also known as complementation) tends to make queries difficult to
process and difficult to reason about. Many basic problems are decidable and
admit practical algorithms in the case of unions of conjunctive queries, but
become difficult or even undecidable when queries are allowed to contain
negation. Inspired by recent results in finite model theory, we consider a
restricted form of negation, guarded negation. We introduce a fragment of SQL,
called GN-SQL, as well as a fragment of Datalog with stratified negation,
called GN-Datalog, that allow only guarded negation, and we show that these
query languages are computationally well behaved, in terms of testing query
containment, query evaluation, open-world query answering, and boundedness.
GN-SQL and GN-Datalog subsume a number of well known query languages and
constraint languages, such as unions of conjunctive queries, monadic Datalog,
and frontier-guarded tgds. In addition, an analysis of standard benchmark
workloads shows that most usage of negation in SQL in practice is guarded
negation
Web and Semantic Web Query Languages
A number of techniques have been developed to facilitate
powerful data retrieval on the Web and Semantic Web. Three categories
of Web query languages can be distinguished, according to the format
of the data they can retrieve: XML, RDF and Topic Maps. This article
introduces the spectrum of languages falling into these categories
and summarises their salient aspects. The languages are introduced using
common sample data and query types. Key aspects of the query
languages considered are stressed in a conclusion
XML data exchange under expressive mappings
Data Exchange is the problem of transforming data in one format (the source schema)
into data in another format (the target schema). Its core component is a schema mapping,
which is a high level specification of how such transformation should be done. Relational
data exchange has been extensively studied, but exchanging XML data have been paid
much less attention. The goal of this thesis is to develop a theory of XML data exchange
with expressive schema mappings, extending a previous work using restricted mappings.
Our mapping language is based on tree patterns that can use horizontal navigation and
data comparison in addition to downward navigation.
First we look at static analysis problems concerning a single mapping. More specif-
ically, we consider consistency problems with different flavours. One such problem, for
instance, asks if any tree has a solution under the given mapping. Then we turn to analyse
the complexity of mapping themselves, i.e., recognising pairs of trees such that the one
is mapped to the other. For both problems, we provide classifications based on sets of
features used in the mappings.
Second we investigate the composition of XML schema mappings. Generally it is hard,
or rather simply impossible, to achieve closure under composition in XML settings unlike
in relational settings. Nevertheless we identify a class of XML schema mappings that is
closed under composition.
Lastly we consider the problem of query answering. It is important to exchange data
so that we can feasibly answer queries while it often leads to intractability. We identify the
dividing line between tractable and intractable cases: answering queries with extended
features is always intractable while tractability of answering simple queries can be retained
in extended mappings
- ā¦