    Query Rewriting and Optimization for Ontological Databases

    Ontological queries are evaluated against a knowledge base consisting of an extensional database and an ontology (i.e., a set of logical assertions and constraints which derive new intensional knowledge from the extensional database), rather than directly on the extensional database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this paper, we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent first-order query against the underlying extensional database. We present a novel query rewriting algorithm for rather general types of ontological constraints which is well-suited for practical implementations. In particular, we show how a conjunctive query against a knowledge base, expressed using linear and sticky existential rules, that is, members of the recently introduced Datalog+/- family of ontology languages, can be compiled into a union of conjunctive queries (UCQ) against the underlying database. Ontological query optimization, in this context, attempts to improve this rewriting process so to produce possibly small and cost-effective UCQ rewritings for an input query.Comment: arXiv admin note: text overlap with arXiv:1312.5914 by other author

    A conceptual method for data integration in business analytics

    Viele Unternehmen funktionieren derzeit in einem schnellen, dynamischen und vor allem unbestĂ€ndigen Umfeld und wettbewerbsintensiven Markt. Daraus folgt, dass schnelle und faktenbasierende Entscheidungen ein wichtiger Erfolgsfaktor sein können. Basis fĂŒr solche Entscheidungen sind oft Informationen aus Business Intelligence und Business Analytics. Eine der Herausforderungen bei der Schaffung von hochqualitativer Information fĂŒr GeschĂ€ftsentscheidungen ist die Konsolidierung der Daten, die hĂ€ufig aus mehrfachen heterogenen Systemen innerhalb eines Unternehmens oder in ein oder mehreren Standorten verteilt sind. ETL-Prozesse (Extraction, Transforming and Loading) sind hĂ€ufig im Einsatz, um heterogene Daten aus einem oder mehreren Datenquellen in einem Zielsystem zusammenzufĂŒhren mit dem Ziel Data Marts oder Date Warehouse zu erstellen. Aufgrund mangelnder allgemeiner Methoden oder AnsĂ€tze, um systematisch solche ETL-Prozesse zu bewĂ€ltigen, und Aufgrund der hohen KomplexitĂ€t der Integration von Daten aus multiplen Quellen in einer allgemeinen, vereinheitlichten Darstellung, ist es sowohl fĂŒr Fachleute als auch fĂŒr die wenige erfahrene Anwender schwierig, Daten erfolgreich zu konsolidieren. Derzeit wird der analytische Prozess oft ohne vordefiniertes Rahmenwerk durchgefĂŒhrt und basiert eher auf informelles Wissen als auf eine wissenschaftliche Methodik. Das grĂ¶ĂŸte Problem mit kommerzieller Software, die den Datenintegrationsprozess inklusive Visualisierung, Wiederverwendung von analytischen Sequenzen und automatischer Übersetzung der visuellen Beschreibung in einem ausfĂŒhrbaren Code unterstĂŒtzt, ist, dass Metadaten fĂŒr die Datenintegration generell nur syntaktisches Wissen darstellt. Semantische Informationen ĂŒber die Datenstruktur sind typsicherweise nur in rudimentĂ€rer Form vorhanden und das obwohl sie eine signifikante Rolle bei der Definition des analytischen Modells und der Evaluierung des Ergebnisse spielen. Vor diesem Hintergrund hat Grossmann das “Conceptual Approach for Data Integration for Business Analytics” formuliert. Es zielt darauf hin, die KomplexitĂ€t der analytischen Prozesse zu reduzieren und FachkrĂ€fte in ihrer Arbeit zu unterstĂŒtzen, um somit auch den Prozess fĂŒr weniger erfahrene Anwender in unterschiedlichen DomĂ€nen zugĂ€nglich zu machen. Das Konzept ist detailliertes Wissen ĂŒber Daten in Business Analytics, speziell Information ĂŒber Semantik, zu berĂŒcksichtigen. Der Fokus liegt auf die Einbeziehung der strukturierten Beschreibung der Transformationsprozesse im Business Analytics, wo Informationen ĂŒber AbhĂ€ngigkeiten und Nebeneffekte von Algorithmen auch inkludiert sind. DarĂŒber hinaus bezieht dieser Ansatz das Meta-Modell Konzept mit ein: es prĂ€sentiert ein Rahmenwerk mit Modellierungskonzepte fĂŒr Datenintegration fĂŒr Business Analytics. Basierend auf Grossmans Ansatz ist das Ziel dieser Masterarbeit die Entwicklung eines Meta-Model Prototyps, der die Datenintegration fĂŒr Business Analytics unterstĂŒtz. Der Fokus liegt auf dem intellektuellen Prozess der Umwandlung einer theoretischen Methode in einem konzeptuellen Model, das auf ein Rahmenwerk von Modellierungsmethoden angewendet werden kann und welches zu den spezifischen Konzepten fĂŒr eine bestimmte angewandte Meta-Model Plattform passt. Das Ergebnis ist ein Prototyp, der auf einer generischen konzeptuellen Methode basiert, welche unabhĂ€ngig von der AusfĂŒhrbarkeit einer Plattform ist. DarĂŒber hinaus gibt es keine vordefinierte GranularitĂ€tsebene und die Modellobjekte sind fĂŒr die unterschiedlichen Phasen der Datenintegration Prozess wiederverwendbar. Der Prototyp wurde auf der Open Model Plattform eingesetzt. Die Open Model Plattform ist eine Initiative der UniversitĂ€t Wien mit dem Ziel die Verwendung von Modellierungsmethoden zu erweitern und diese durch das Rahmenwerk, welches alle mögliche ModellierungsaktivitĂ€ten beinhaltet, fĂŒr GeschĂ€ftsdomĂ€ne zur VerfĂŒgung zu stellen und nĂŒtzlich zu machen, um die ZugĂ€nglichkeit bei dein Anwendern zu steigern.Today many organizations are operating in dynamic and rapid changing environment and highly competitive markets. Consequently fast and accurate fact-based decisions can be an important success factor. The basis for such decisions is usually business information as a result of business intelligence and business analytics in the corporate associations. One of the challenges of creating high-quality information for business decision is to consolidate the collected data that is spread in multiple heterogeneous systems throughout the organization in one or many different locations. Typically ETL-processes (Extraction, Transforming and Loading) are used to merge heterogeneous data from one or more data sources into a target system to form data repositories, data marts, or data warehouses. Due to the lack of a common methods or approaches to systematically manage such ETL processes and the high complexity of the task of integrating data from multiple sources to one common and unified view, it is difficult for both professionals and less experienced users to successfully consolidate data. Currently the analysis process is often performed without any predefined framework and is rather based on informal basis than a scientific methodology. Hence, for commercial tools that are supporting the data integration process including visualization of the integration, the reuse of analyses sequences and the automatic translation of the visual description to executable code, the major problem is that metadata used for data integration in general is only employed for representation of syntactic knowledge. Semantic information about the data structure is typically only available in a rudimentary form though it plays a significant role in defining the analysis model and the evaluation of the results. With this background Grossmann developed a “Conceptual Approach for Data Integration for Business Analytics”. It aims to support professionals by making business analytics easier and consequently more applicable to less experienced user in different domains. The idea is to incorporate detailed knowledge about the data in business analytics, especially information about semantics. It focuses on the inclusion of a more structured description of the transformation process in business analytics in which information about dependencies and side effects of the algorithms are included. Furthermore the approach incorporates the concept of meta-modelling; it presents a framework including the modelling concepts for data integration for business analytics. The idea of the thesis at hand is to develop a meta-model prototype that supports Data Integration for Business Analytics based on Grossman’s approach. The paper focuses on the intellectual process of transforming the theoretical method into a conceptual model which can be applied to the framework of a modelling methods and which fits to the specific concepts of a meta-model platform used. The result is a prototype based on a generic conceptual method which is execution platform independent, there are no pre-defined granularity levels and the objects of the model are re-usable for the different phases of the data integration process. The prototype is deployed on the Open Model Platform, an initiative started at the University of Vienna that aims to extend the usage of modelling methods and models and to make it more accessible to users by offering a framework including all kinds of modelling activities useful for business applications

    A framework for information integration using ontological foundations

    With the increasing amount of data, ability to integrate information has always been a competitive advantage in information management. Semantic heterogeneity reconciliation is an important challenge of many information interoperability applications such as data exchange and data integration. In spite of a large amount of research in this area, the lack of theoretical foundations behind semantic heterogeneity reconciliation techniques has resulted in many ad-hoc approaches. In this thesis, I address this issue by providing ontological foundations for semantic heterogeneity reconciliation in information integration. In particular, I investigate fundamental semantic relations between properties from an ontological point of view and show how one of the basic and natural relations between properties – inferring implicit properties from existing properties – can be used to enhance information integration. These ontological foundations have been exploited in four aspects of information integration. First, I propose novel algorithms for semantic enrichment of schema mappings. Second, using correspondences between similar properties at different levels of abstraction, I propose a configurable data integration system, in which query rewriting techniques allows the tradeoff between accuracy and completeness in query answering. Third, to keep the semantics in data exchange, I propose an entity preserving data exchange approach that reflects source entities in the target independent of classification of entities. Finally, to improve the efficiency of the data exchange approach proposed in this thesis, I propose an extended model of the column-store model called sliced column store. Working prototypes of the techniques proposed in this thesis are implemented to show the feasibility of realizing these techniques. Experiments that have been performed using various datasets show the techniques proposed in this thesis outperform many existing techniques in terms of ability to handle semantic heterogeneities and performance of information exchange

    A binding approach to scientific data and metadata management

    Engineering truly automated data integration and translation systems

    This thesis presents an automated, data-driven integration process for relational databases. Whereas previous integration methods assumed a large amount of user involvement as well as the availability of database meta-data, we make no use of meta-data and little end user input. This is done using a novel join and translation finding algorithm that searches for the proper key / foreign key relationships while inferring the instance transformations from one database to another. Because we rely only on the relations that bind the attributes together, we make no use of the database schema information. A novel searching method allows us to search the database for relevant objects without requiring server side indexes or cooperative databases