107 research outputs found

    Introducing Dynamic Behavior in Amalgamated Knowledge Bases

    Full text link
    The problem of integrating knowledge from multiple and heterogeneous sources is a fundamental issue in current information systems. In order to cope with this problem, the concept of mediator has been introduced as a software component providing intermediate services, linking data resources and application programs, and making transparent the heterogeneity of the underlying systems. In designing a mediator architecture, we believe that an important aspect is the definition of a formal framework by which one is able to model integration according to a declarative style. To this purpose, the use of a logical approach seems very promising. Another important aspect is the ability to model both static integration aspects, concerning query execution, and dynamic ones, concerning data updates and their propagation among the various data sources. Unfortunately, as far as we know, no formal proposals for logically modeling mediator architectures both from a static and dynamic point of view have already been developed. In this paper, we extend the framework for amalgamated knowledge bases, presented by Subrahmanian, to deal with dynamic aspects. The language we propose is based on the Active U-Datalog language, and extends it with annotated logic and amalgamation concepts. We model the sources of information and the mediator (also called supervisor) as Active U-Datalog deductive databases, thus modeling queries, transactions, and active rules, interpreted according to the PARK semantics. By using active rules, the system can efficiently perform update propagation among different databases. The result is a logical environment, integrating active and deductive rules, to perform queries and update propagation in an heterogeneous mediated framework.Comment: Other Keywords: Deductive databases; Heterogeneous databases; Active rules; Update

    Open issues in semantic query optimization in relational DBMS

    Get PDF
    After two decades of research into Semantic Query Optimization (SQO) there is clear agreement as to the efficacy of SQO. However, although there are some experimental implementations there are still no commercial implementations. We first present a thorough analysis of research into SQO. We identify three problems which inhibit the effective use of SQO in Relational Database Management Systems(RDBMS). We then propose solutions to these problems and describe first steps towards the implementation of an effective semantic query optimizer for relational databases

    Reconciling Equational Heterogeneity within a Data Federation

    Get PDF
    Mappings in most federated databases are conceptualized and implemented as black-box transformations between source schemas and a federated schema. This approach does not allow specific mappings to be declared once and reused in other situations. We present an alternative approach, in which data-level mappings are represented independent of source and federated schemas as a network between “contexts”. This compendious representation expedites the data federation process via mapping reuse and automated mapping composition from simpler mappings. We illustrate the benefits of mapping reuse and composition by using an example that incorporates equational mappings and the application of symbolic equation solving techniques

    Rule-based knowledge aggregation for large-scale protein sequence analysis of influenza A viruses

    Get PDF
    Background: The explosive growth of biological data provides opportunities for new statistical and comparative analyses of large information sets, such as alignments comprising tens of thousands of sequences. In such studies, sequence annotations frequently play an essential role, and reliable results depend on metadata quality. However, the semantic heterogeneity and annotation inconsistencies in biological databases greatly increase the complexity of aggregating and cleaning metadata. Manual curation of datasets, traditionally favoured by life scientists, is impractical for studies involving thousands of records. In this study, we investigate quality issues that affect major public databases, and quantify the effectiveness of an automated metadata extraction approach that combines structural and semantic rules. We applied this approach to more than 90,000 influenza A records, to annotate sequences with protein name, virus subtype, isolate, host, geographic origin, and year of isolation. Results: Over 40,000 annotated Influenza A protein sequences were collected by combining information from more than 90,000 documents from NCBI public databases. Metadata values were automatically extracted, aggregated and reconciled from several document fields by applying user-defined structural rules. For each property, values were recovered from ≥88.8% of records, with accuracy exceeding 96% in most cases. Because of semantic heterogeneity, each property required up to six different structural rules to be combined. Significant quality differences between databases were found: GenBank documents yield values more reliably than documents extracted from GenPept. Using a simple set of semantic rules and a reasoner, we reconstructed relationships between sequences from the same isolate, thus identifying 7640 isolates. Validation of isolate metadata against a simple ontology highlighted more than 400 inconsistencies, leading to over 3,000 property value corrections. Conclusion: To overcome the quality issues inherent in public databases, automated knowledge aggregation with embedded intelligence is needed for large-scale analyses. Our results show that user-controlled intuitive approaches, based on combination of simple rules, can reliably automate various curation tasks, reducing the need for manual corrections to approximately 5% of the records. Emerging semantic technologies possess desirable features to support today's knowledge aggregation tasks, with a potential to bring immediate benefits to this field

    A conceptual method for data integration in business analytics

    Get PDF
    Viele Unternehmen funktionieren derzeit in einem schnellen, dynamischen und vor allem unbeständigen Umfeld und wettbewerbsintensiven Markt. Daraus folgt, dass schnelle und faktenbasierende Entscheidungen ein wichtiger Erfolgsfaktor sein können. Basis für solche Entscheidungen sind oft Informationen aus Business Intelligence und Business Analytics. Eine der Herausforderungen bei der Schaffung von hochqualitativer Information für Geschäftsentscheidungen ist die Konsolidierung der Daten, die häufig aus mehrfachen heterogenen Systemen innerhalb eines Unternehmens oder in ein oder mehreren Standorten verteilt sind. ETL-Prozesse (Extraction, Transforming and Loading) sind häufig im Einsatz, um heterogene Daten aus einem oder mehreren Datenquellen in einem Zielsystem zusammenzuführen mit dem Ziel Data Marts oder Date Warehouse zu erstellen. Aufgrund mangelnder allgemeiner Methoden oder Ansätze, um systematisch solche ETL-Prozesse zu bewältigen, und Aufgrund der hohen Komplexität der Integration von Daten aus multiplen Quellen in einer allgemeinen, vereinheitlichten Darstellung, ist es sowohl für Fachleute als auch für die wenige erfahrene Anwender schwierig, Daten erfolgreich zu konsolidieren. Derzeit wird der analytische Prozess oft ohne vordefiniertes Rahmenwerk durchgeführt und basiert eher auf informelles Wissen als auf eine wissenschaftliche Methodik. Das größte Problem mit kommerzieller Software, die den Datenintegrationsprozess inklusive Visualisierung, Wiederverwendung von analytischen Sequenzen und automatischer Übersetzung der visuellen Beschreibung in einem ausführbaren Code unterstützt, ist, dass Metadaten für die Datenintegration generell nur syntaktisches Wissen darstellt. Semantische Informationen über die Datenstruktur sind typsicherweise nur in rudimentärer Form vorhanden und das obwohl sie eine signifikante Rolle bei der Definition des analytischen Modells und der Evaluierung des Ergebnisse spielen. Vor diesem Hintergrund hat Grossmann das “Conceptual Approach for Data Integration for Business Analytics” formuliert. Es zielt darauf hin, die Komplexität der analytischen Prozesse zu reduzieren und Fachkräfte in ihrer Arbeit zu unterstützen, um somit auch den Prozess für weniger erfahrene Anwender in unterschiedlichen Domänen zugänglich zu machen. Das Konzept ist detailliertes Wissen über Daten in Business Analytics, speziell Information über Semantik, zu berücksichtigen. Der Fokus liegt auf die Einbeziehung der strukturierten Beschreibung der Transformationsprozesse im Business Analytics, wo Informationen über Abhängigkeiten und Nebeneffekte von Algorithmen auch inkludiert sind. Darüber hinaus bezieht dieser Ansatz das Meta-Modell Konzept mit ein: es präsentiert ein Rahmenwerk mit Modellierungskonzepte für Datenintegration für Business Analytics. Basierend auf Grossmans Ansatz ist das Ziel dieser Masterarbeit die Entwicklung eines Meta-Model Prototyps, der die Datenintegration für Business Analytics unterstütz. Der Fokus liegt auf dem intellektuellen Prozess der Umwandlung einer theoretischen Methode in einem konzeptuellen Model, das auf ein Rahmenwerk von Modellierungsmethoden angewendet werden kann und welches zu den spezifischen Konzepten für eine bestimmte angewandte Meta-Model Plattform passt. Das Ergebnis ist ein Prototyp, der auf einer generischen konzeptuellen Methode basiert, welche unabhängig von der Ausführbarkeit einer Plattform ist. Darüber hinaus gibt es keine vordefinierte Granularitätsebene und die Modellobjekte sind für die unterschiedlichen Phasen der Datenintegration Prozess wiederverwendbar. Der Prototyp wurde auf der Open Model Plattform eingesetzt. Die Open Model Plattform ist eine Initiative der Universität Wien mit dem Ziel die Verwendung von Modellierungsmethoden zu erweitern und diese durch das Rahmenwerk, welches alle mögliche Modellierungsaktivitäten beinhaltet, für Geschäftsdomäne zur Verfügung zu stellen und nützlich zu machen, um die Zugänglichkeit bei dein Anwendern zu steigern.Today many organizations are operating in dynamic and rapid changing environment and highly competitive markets. Consequently fast and accurate fact-based decisions can be an important success factor. The basis for such decisions is usually business information as a result of business intelligence and business analytics in the corporate associations. One of the challenges of creating high-quality information for business decision is to consolidate the collected data that is spread in multiple heterogeneous systems throughout the organization in one or many different locations. Typically ETL-processes (Extraction, Transforming and Loading) are used to merge heterogeneous data from one or more data sources into a target system to form data repositories, data marts, or data warehouses. Due to the lack of a common methods or approaches to systematically manage such ETL processes and the high complexity of the task of integrating data from multiple sources to one common and unified view, it is difficult for both professionals and less experienced users to successfully consolidate data. Currently the analysis process is often performed without any predefined framework and is rather based on informal basis than a scientific methodology. Hence, for commercial tools that are supporting the data integration process including visualization of the integration, the reuse of analyses sequences and the automatic translation of the visual description to executable code, the major problem is that metadata used for data integration in general is only employed for representation of syntactic knowledge. Semantic information about the data structure is typically only available in a rudimentary form though it plays a significant role in defining the analysis model and the evaluation of the results. With this background Grossmann developed a “Conceptual Approach for Data Integration for Business Analytics”. It aims to support professionals by making business analytics easier and consequently more applicable to less experienced user in different domains. The idea is to incorporate detailed knowledge about the data in business analytics, especially information about semantics. It focuses on the inclusion of a more structured description of the transformation process in business analytics in which information about dependencies and side effects of the algorithms are included. Furthermore the approach incorporates the concept of meta-modelling; it presents a framework including the modelling concepts for data integration for business analytics. The idea of the thesis at hand is to develop a meta-model prototype that supports Data Integration for Business Analytics based on Grossman’s approach. The paper focuses on the intellectual process of transforming the theoretical method into a conceptual model which can be applied to the framework of a modelling methods and which fits to the specific concepts of a meta-model platform used. The result is a prototype based on a generic conceptual method which is execution platform independent, there are no pre-defined granularity levels and the objects of the model are re-usable for the different phases of the data integration process. The prototype is deployed on the Open Model Platform, an initiative started at the University of Vienna that aims to extend the usage of modelling methods and models and to make it more accessible to users by offering a framework including all kinds of modelling activities useful for business applications

    Examining the Application of Modular and Contextualised Ontology in Query Expansions for Information Retrieval

    Get PDF
    This research considers the ongoing challenge of semantics-based search from the perspective of how to exploit Semantic Web languages for search in the current Web environment. The purpose of the PhD was to use ontology-based query expansion (OQE) to improve search effectiveness by increasing search precision, i.e. retrieving relevant documents in the topmost ranked positions in a returned document list. Query experiments have required a novel search tool that can combine Semantic Web technologies in an otherwise traditional IR process using a Web document collection
    corecore