1,292 research outputs found

    Composition and Inversion of Schema Mappings

    Full text link
    In the recent years, a lot of attention has been paid to the development of solid foundations for the composition and inversion of schema mappings. In this paper, we review the proposals for the semantics of these crucial operators. For each of these proposals, we concentrate on the three following problems: the definition of the semantics of the operator, the language needed to express the operator, and the algorithmic issues associated to the problem of computing the operator. It should be pointed out that we primarily consider the formalization of schema mappings introduced in the work on data exchange. In particular, when studying the problem of computing the composition and inverse of a schema mapping, we will be mostly interested in computing these operators for mappings specified by source-to-target tuple-generating dependencies

    Grundlagen der Anfrageverarbeitung beim relationalen Datenaustausch

    Get PDF
    Relational data exchange deals with translating relational data according to a given specification. This problem is one of the many tasks that arise in data integration, for example, in data restructuring, in ETL (Extract-Transform-Load) processes used for updating data warehouses, or in data exchange between different, possibly independently created, applications. Systems for relational data exchange exist for several decades now. Motivated by their experiences with one of those systems, Fagin, Kolaitis, Miller, and Popa (2003) studied fundamental and algorithmic issues arising in relational data exchange. One of these issues is how to answer queries that are posed against the target schema (i.e., against the result of the data exchange) so that the answers are consistent with the source data. For monotonic queries, the certain answers semantics proposed by Fagin, Kolaitis, Miller, and Popa (2003) is appropriate. For many non-monotonic queries, however, the certain answers semantics was shown to yield counter-intuitive results. This thesis deals with computing the certain answers for monotonic queries on the one hand, and on the other hand, it deals with the issue of which semantics are appropriate for answering non-monotonic queries, and how hard it is to evaluate non-monotonic queries under these semantics. As shown by Fagin, Kolaitis, Miller, and Popa (2003), computing the certain answers for unions of conjunctive queries - a subclass of the monotonic queries - basically reduces to computing universal solutions, provided the data transformation is specified by a set of tgds (tuple-generating dependencies) and egds (equality-generating dependencies). If M is such a specification and S is a source database, then T is called a solution for S under M if T is a possible result of translating S according to M. Intuitively, universal solutions are most general solutions. Since the above-mentioned work by Fagin, Kolaitis, Miller, and Popa it was unknown whether it is decidable if a source database has a universal solution under a given data exchange specification. In this thesis, we show that this problem is undecidable. More precisely, we construct a specification M that consists of tgds only so that it is undecidable whether a given source database has a universal solution under M. From the proof it also follows that it is undecidable whether the chase procedure - by which universal models can be obtained - terminates on a given source database and the set of tgds in M. The above results in particular strengthen results of Deutsch, Nash, and Remmel (2008). Concerning the issue of which semantics are appropriate for answering non-monotonic queries, we study several semantics for answering such queries. All of these semantics are based on the closed world assumption (CWA). First, the CWA-semantics of Libkin (2006) are extended so that they can be applied to specifications consisting of tgds and egds. The key is to extend the concept of CWA-solution, on which the CWA-semantics are based. CWA-solutions are characterized as universal solutions that are derivable from the source database using a suitably controlled version of the chase procedure. In particular, if CWA-solutions exist, then there is a minimal CWA-solution that is unique up to isomorphism: the core of the universal solutions introduced by Fagin, Kolaitis, and Popa (2003). We show that evaluation of a query under some of the CWA-semantics reduces to computing the certain answers to the query on the minimal CWA-solution. The CWA-semantics resolve some the known problems with answering non-monotonic queries. There are, however, two natural properties that are not possessed by the CWA-semantics. On the one hand, queries may be answered differently with respect to data exchange specifications that are logically equivalent. On the other hand, there are queries whose answer under the CWA-semantics intuitively contradicts the information derivable from the source database and the data exchange specification. To find an alternative semantics, we first test several CWA-based semantics from the area of deductive databases for their suitability regarding non-monotonic query answering in relational data exchange. More precisely, we focus on the CWA-semantics by Reiter (1978), the GCWA-semantics (Minker 1982), the EGCWA-semantics (Yahya, Henschen 1985) and the PWS-semantics (Chan 1993). It turns out that these semantics are either too weak or too strong, or do not possess the desired properties. Finally, based on the GCWA-semantics we develop the GCWA*-semantics which intuitively possesses the desired properties. For monotonic queries, some of the CWA-semantics as well as the GCWA*-semantics coincide with the certain answers semantics, that is, results obtained for the certain answers semantics carry over to those semantics. When studying the complexity of evaluating non-monotonic queries under the above-mentioned semantics, we focus on the data complexity, that is, the complexity when the data exchange specification and the query are fixed. We show that in many cases, evaluating non-monotonic queries is hard: co-NP- or NP-complete, or even undecidable. For example, evaluating conjunctive queries with at least one negative literal under simple specifications may be co-NP-hard. Notice, however, that this result only says that there is such a query and such a specification for which the problem is hard, but not that the problem is hard for all such queries and specifications. On the other hand, we identify a broad class of queries - the class of universal queries - which can be evaluated in polynomial time under the GCWA*-semantics, provided the data exchange specification is suitably restricted. More precisely, we show that universal queries can be evaluated on the core of the universal solutions, independent of the source database and the specification.Beim relationalen Datenaustausch geht es um die Transformation relationaler Daten gemäß einer vorgegebenen Spezifikation. Dieses Problem ist eines der vielen Probleme, die bei der Informationsintegration anfallen, und unterliegt Anwendungen wie der Datenrestrukturierung, dem Austausch von Daten zwischen unabhängig voneinander entwickelten Anwendungen und der Aktualisierung von Datenwarenhäusern mittels ETL. Systeme für den relationalen Datenaustausch existieren bereits seit einiger Zeit. Motiviert durch die Erfahrungen mit solch einem System haben sich Fagin, Kolaitis, Miller und Popa (2003) genauer mit grundlegenden und algorithmischen Fragestellungen zum relationalen Datenaustausch auseinandergesetzt. Eine dieser Fragestellungen ist, wie Anfragen über dem Zielschema (d.h. Anfragen an das Resultat des Datenaustauschs) beantwortet werden können, so dass die Antworten semantisch konsistent mit den Eingabedaten sind. Für monotone Anfragen ist die von Fagin, Kolaitis, Miller und Popa (2003) vorgestellte Sichere Antworten-Semantik gut geeignet. Für viele nicht-monotone Anfragen liefert sie jedoch unnatürliche Antworten. Die vorliegende Dissertation beschäftigt sich zum Einen mit der Berechnung der sicheren Antworten für monotone Anfragen und zum Anderen mit der Problematik, was geeignete Semantiken für nicht-monotone Anfragen sind und wie schwer es ist, nicht-monotone Anfragen unter diesen Semantiken auszuwerten. Die Berechnung der sicheren Antworten für Vereinigungen konjunktiver Anfragen - einer Teilklasse der monotonen Anfragen - reduziert sich nach Fagin, Kolaitis, Miller und Popa (2003) im Wesentlichen auf die Berechnung universeller Lösungen, wenn die Datentransformation durch eine Menge so genannter tgds (engl. tuple-generating dependencies) und egds (engl. equality-generating dependencies) spezifiziert wurde. Wenn M solch eine Spezifikation und S eine Quelldatenbank ist, so nennt man T eine Lösung für S unter M, wenn T ein mögliches Resultat der Transformation von S bezüglich M ist. Universelle Lösungen sind intuitiv allgemeinste Lösungen. Seit der oben genannten Arbeit von Fagin, Kolaitis, Miller und Popa war unbekannt, ob die Existenz universeller Lösungen für eine gegebene Quelldatenbank entscheidbar ist. In der vorliegenden Dissertation wird gezeigt, dass dieses Problem unentscheidbar ist. Genauer wird gezeigt, dass es bereits eine feste Spezifikation M mittels tgds gibt, so dass unentscheidbar ist, ob eine gegebene Quelldatenbank unter M eine universelle Lösung besitzt. Nebenbei folgt aus dem Beweis, dass das Problem, ob die zur Berechnung universeller Lösungen eingesetzte Chase-Prozedur für die Menge der tgds in M bei gegebener Quelldatenbank terminiert, unentscheidbar ist. Die oben genannten Resultate verstärken insbesondere Ergebnisse von Deutsch, Nash und Remmel (2008). Zu der Frage, was geeignete Semantiken für nicht-monotone Anfragen sind, werden verschiedene Semantiken für nicht-monotone Anfragen untersucht. All diese Semantiken basieren auf der so genannten Closed World Assumption (CWA). Zunächst werden die von Libkin (2006) eingeführten CWA-Semantiken so erweitert, dass diese auf Spezifikationen durch tgds und egds anwendbar sind. Der Schlüssel dazu ist die Erweiterung des Konzeptes der CWA-Lösungen, auf dem die CWA-Semantiken basieren. CWA-Lösungen werden als universelle Lösungen charakterisiert, die durch eine spezielle Variante der Chase-Prozedur aus einer Quelldatenbank abgeleitet werden können. Insbesondere gibt es eine bis auf Isomorphie eindeutige minimale CWA-Lösung (falls mindestens eine CWA-Lösung existiert): den von Fagin, Kolaitis und Popa (2003) eingeführten Kern der universellen Lösungen. Die Auswertung von Anfragen unter einigen der CWA-Semantiken lassen sich auf die Berechnung der sicheren Antworten der Anfrage auf einer solchen minimalen CWA-Lösung reduzieren. Die CWA-Semantik beseitigt einige der bekannten Probleme bei der Beantwortung nicht-monotoner Anfragen. Es gibt jedoch zwei natürliche Eigenschaften, die die CWA-Semantiken nicht besitzen. Zum Einen werden Anfragen unter logisch äquivalenten Spezifikationen nicht notwendigerweise gleich beantwortet. Des Weiteren gibt es Anfragen, deren Antwort unter den CWA-Semantiken intuitiv den aus der Quelldatenbank und der Spezifikation ableitbaren Information widerspricht. Um eine alternative Semantik zu finden, werden zuerst verschiedene CWA-basierte Semantiken aus dem Bereich der deduktiven Datenbanken betrachtet und auf ihre Tauglichkeit zur Beantwortung nicht-monotoner Anfragen im relationalen Datenaustausch untersucht. Genauer konzentrieren wir uns hier auf die CWA-Semantik von Reiter (1978), die GCWA-Semantik (Minker 1982), die EGCWA-Semantik (Yahya, Henschen 1985) und die PWS-Semantik (Chan 1993). Es stellt sich heraus, dass diese Semantiken zu stark oder zu schwach sind bzw. nicht die erforderlichen Eigenschaften aufweisen. Schließlich wird basierend auf der GCWA-Semantik die GCWA*-Semantik entwickelt, die intuitiv die gewünschten Eigenschaften besitzt. Für monotone Anfragen stimmen einige der CWA-Semantiken sowie die GCWA*-Semantik mit der Sicheren Antworten-Semantik überein, d.h. Resultate für die Sichere Antworten-Semantik gehen auf diese Semantiken über. Bei der Frage, wie schwer es ist, nicht-monotone Anfragen unter den oben angesprochenen Semantiken auszuwerten, konzentrieren wir uns auf die Datenkomplexität, d.h. die Komplexität bei fester Spezifikation und Anfrage. Wir zeigen, dass die Auswertung nicht-monotoner Anfragen in vielen Fällen sehr schwierig ist: co-NP- bzw. NP-schwer bzw. sogar unentscheidbar in der Datenkomplexität. So kann z.B. die Auswertung konjunktiver Anfragen mit nur einem zusätzlichen negativen Literal unter bereits sehr einfachen Spezifikationen co-NP-hart sein. Man beachte, dass dieses Resultat besagt, dass es eine schwierige Anfrage und eine schwierige Spezifikation gibt, jedoch nicht, dass alle solchen Anfragen und Spezifikationen schwer sind. Auf der anderen Seite identifizieren wir eine größere Klasse von Anfragen - die so genannten universellen Anfragen -, die sich unter der GCWA*-Semantik in Polynomialzeit auswerten lassen, wenn die Spezifikation der Datentransformation genügend eingeschränkt ist. Präziser wird gezeigt, dass universelle Anfragen unabhängig von der (genügend eingeschränkten) Spezifikation und der Quelldatenbank auf dem Kern der universellen Lösungen in Polynomialzeit auswertet werden können, auf dem auch eine Vielzahl anderer Anfragen ausgewertet werden können

    Composition with Target Constraints

    Full text link
    It is known that the composition of schema mappings, each specified by source-to-target tgds (st-tgds), can be specified by a second-order tgd (SO tgd). We consider the question of what happens when target constraints are allowed. Specifically, we consider the question of specifying the composition of standard schema mappings (those specified by st-tgds, target egds, and a weakly acyclic set of target tgds). We show that SO tgds, even with the assistance of arbitrary source constraints and target constraints, cannot specify in general the composition of two standard schema mappings. Therefore, we introduce source-to-target second-order dependencies (st-SO dependencies), which are similar to SO tgds, but allow equations in the conclusion. We show that st-SO dependencies (along with target egds and target tgds) are sufficient to express the composition of every finite sequence of standard schema mappings, and further, every st-SO dependency specifies such a composition. In addition to this expressive power, we show that st-SO dependencies enjoy other desirable properties. In particular, they have a polynomial-time chase that generates a universal solution. This universal solution can be used to find the certain answers to unions of conjunctive queries in polynomial time. It is easy to show that the composition of an arbitrary number of standard schema mappings is equivalent to the composition of only two standard schema mappings. We show that surprisingly, the analogous result holds also for schema mappings specified by just st-tgds (no target constraints). This is proven by showing that every SO tgd is equivalent to an unnested SO tgd (one where there is no nesting of function symbols). Similarly, we prove unnesting results for st-SO dependencies, with the same types of consequences.Comment: This paper is an extended version of: M. Arenas, R. Fagin, and A. Nash. Composition with Target Constraints. In 13th International Conference on Database Theory (ICDT), pages 129-142, 201

    A semantic and agent-based approach to support information retrieval, interoperability and multi-lateral viewpoints for heterogeneous environmental databases

    Get PDF
    PhDData stored in individual autonomous databases often needs to be combined and interrelated. For example, in the Inland Water (IW) environment monitoring domain, the spatial and temporal variation of measurements of different water quality indicators stored in different databases are of interest. Data from multiple data sources is more complex to combine when there is a lack of metadata in a computation forin and when the syntax and semantics of the stored data models are heterogeneous. The main types of information retrieval (IR) requirements are query transparency and data harmonisation for data interoperability and support for multiple user views. A combined Semantic Web based and Agent based distributed system framework has been developed to support the above IR requirements. It has been implemented using the Jena ontology and JADE agent toolkits. The semantic part supports the interoperability of autonomous data sources by merging their intensional data, using a Global-As-View or GAV approach, into a global semantic model, represented in DAML+OIL and in OWL. This is used to mediate between different local database views. The agent part provides the semantic services to import, align and parse semantic metadata instances, to support data mediation and to reason about data mappings during alignment. The framework has applied to support information retrieval, interoperability and multi-lateral viewpoints for four European environmental agency databases. An extended GAV approach has been developed and applied to handle queries that can be reformulated over multiple user views of the stored data. This allows users to retrieve data in a conceptualisation that is better suited to them rather than to have to understand the entire detailed global view conceptualisation. User viewpoints are derived from the global ontology or existing viewpoints of it. This has the advantage that it reduces the number of potential conceptualisations and their associated mappings to be more computationally manageable. Whereas an ad hoc framework based upon conventional distributed programming language and a rule framework could be used to support user views and adaptation to user views, a more formal framework has the benefit in that it can support reasoning about the consistency, equivalence, containment and conflict resolution when traversing data models. A preliminary formulation of the formal model has been undertaken and is based upon extending a Datalog type algebra with hierarchical, attribute and instance value operators. These operators can be applied to support compositional mapping and consistency checking of data views. The multiple viewpoint system was implemented as a Java-based application consisting of two sub-systems, one for viewpoint adaptation and management, the other for query processing and query result adjustment

    XML Matchers: approaches and challenges

    Full text link
    Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure
    corecore