5,212 research outputs found

    Integration of Heterogeneous Databases: Discovery of Meta-Information and Maintenance of Schema-Restructuring Views

    Get PDF
    In today\u27s networked world, information is widely distributed across many independent databases in heterogeneous formats. Integrating such information is a difficult task and has been adressed by several projects. However, previous integration solutions, such as the EVE-Project, have several shortcomings. Database contents and structure change frequently, and users often have incomplete information about the data content and structure of the databases they use. When information from several such insufficiently described sources is to be extracted and integrated, two problems have to be solved: How can we discover the structure and contents of and interrelationships among unknown databases, and how can we provide durable integration views over several such databases? In this dissertation, we have developed solutions for those key problems in information integration. The first part of the dissertation addresses the fact that knowledge about the interrelationships between databases is essential for any attempt at solving the information integration problem. We are presenting an algorithm called FIND2 based on the clique-finding problem in graphs and k-uniform hypergraphs to discover redundancy relationships between two relations. Furthermore, the algorithm is enhanced by heuristics that significantly reduce the search space when necessary. Extensive experimental studies on the algorithm both with and without heuristics illustrate its effectiveness on a variety of real-world data sets. The second part of the dissertation addresses the durable view problem and presents the first algorithm for incremental view maintenance in schema-restructuring views. Such views are essential for the integration of heterogeneous databases. They are typically defined in schema-restructuring query languages like SchemaSQL, which can transform schema into data and vice versa, making traditional view maintenance based on differential queries impossible. Based on an existing algebra for SchemaSQL, we present an update propagation algorithm that propagates updates along the query algebra tree and prove its correctness. We also propose optimizations on our algorithm and present experimental results showing its benefits over view recomputation

    Htab2RDF: Mapping HTML Tables to RDF Triples

    Get PDF
    The Web has become a tremendously huge data source hidden under linked documents. A significant number of Web documents include HTML tables generated dynamically from relational databases. Often, there is no direct public access to the databases themselves. On the other hand, RDF (Resource Description Framework) gives an efficient mechanism to represent directly data on the Web based on a Web-scalable architecture for identification and interpretation of terms. This leads to the concept of Linked Data on the Web. To allow direct access to data on the Web as Linked Data, we propose in this paper an approach to transform HTML tables into RDF triples. It consists of three main phases: refining, pre-treatment and mapping. The whole process is assisted by a domain ontology and the WordNet lexical database. A tool called Htab2RDF has been implemented. Experiments have been carried out to evaluate and show efficiency of the proposed approach

    An Expressive Language and Efficient Execution System for Software Agents

    Full text link
    Software agents can be used to automate many of the tedious, time-consuming information processing tasks that humans currently have to complete manually. However, to do so, agent plans must be capable of representing the myriad of actions and control flows required to perform those tasks. In addition, since these tasks can require integrating multiple sources of remote information ? typically, a slow, I/O-bound process ? it is desirable to make execution as efficient as possible. To address both of these needs, we present a flexible software agent plan language and a highly parallel execution system that enable the efficient execution of expressive agent plans. The plan language allows complex tasks to be more easily expressed by providing a variety of operators for flexibly processing the data as well as supporting subplans (for modularity) and recursion (for indeterminate looping). The executor is based on a streaming dataflow model of execution to maximize the amount of operator and data parallelism possible at runtime. We have implemented both the language and executor in a system called THESEUS. Our results from testing THESEUS show that streaming dataflow execution can yield significant speedups over both traditional serial (von Neumann) as well as non-streaming dataflow-style execution that existing software and robot agent execution systems currently support. In addition, we show how plans written in the language we present can represent certain types of subtasks that cannot be accomplished using the languages supported by network query engines. Finally, we demonstrate that the increased expressivity of our plan language does not hamper performance; specifically, we show how data can be integrated from multiple remote sources just as efficiently using our architecture as is possible with a state-of-the-art streaming-dataflow network query engine

    Semantic optimisation in datalog programs

    Get PDF
    Bibliography: leaves 138-142.Datalog is the fusion of Prolog and Database technologies aimed at producing an efficient, logic-based, declarative language for databases. This fusion takes the best of logic programming for the syntax of Datalog, and the best of database systems for the operational part of Datalog. As is the case with all declarative languages, optimisation is necessary to improve the efficiency of programs. Semantic optimisation uses meta-knowledge describing the data in the database to optimise queries and rules, aiming to reduce the resources required to answer queries. In this thesis, I analyse prior work that has been done on semantic optimisation and then propose an optimisation system for Datalog that includes optimisation of recursive programs and a semantic knowledge management module. A language, DatalogiC, which is an extension of Datalog that allows semantic knowledge to be expressed, has also been devised as an implementation vehicle. Finally, empirical results concerning the benefits of semantic optimisation are reported
    • …
    corecore