701 research outputs found

    Four Lessons in Versatility or How Query Languages Adapt to the Web

    Get PDF
    Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”

    Implementation of a knowledge discovery and enhancement module from structured information gained from unstructured sources of information

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Mapping-equivalence and oid-equivalence of single-function object-creating conjunctive queries

    Full text link
    Conjunctive database queries have been extended with a mechanism for object creation to capture important applications such as data exchange, data integration, and ontology-based data access. Object creation generates new object identifiers in the result, that do not belong to the set of constants in the source database. The new object identifiers can be also seen as Skolem terms. Hence, object-creating conjunctive queries can also be regarded as restricted second-order tuple-generating dependencies (SO tgds), considered in the data exchange literature. In this paper, we focus on the class of single-function object-creating conjunctive queries, or sifo CQs for short. We give a new characterization for oid-equivalence of sifo CQs that is simpler than the one given by Hull and Yoshikawa and places the problem in the complexity class NP. Our characterization is based on Cohen's equivalence notions for conjunctive queries with multiplicities. We also solve the logical entailment problem for sifo CQs, showing that also this problem belongs to NP. Results by Pichler et al. have shown that logical equivalence for more general classes of SO tgds is either undecidable or decidable with as yet unknown complexity upper bounds.Comment: This revised version has been accepted on 11 January 2016 for publication in The VLDB Journa

    Efficient Learning and Evaluation of Complex Concepts in Inductive Logic Programming

    No full text
    Inductive Logic Programming (ILP) is a subfield of Machine Learning with foundations in logic programming. In ILP, logic programming, a subset of first-order logic, is used as a uniform representation language for the problem specification and induced theories. ILP has been successfully applied to many real-world problems, especially in the biological domain (e.g. drug design, protein structure prediction), where relational information is of particular importance. The expressiveness of logic programs grants flexibility in specifying the learning task and understandability to the induced theories. However, this flexibility comes at a high computational cost, constraining the applicability of ILP systems. Constructing and evaluating complex concepts remain two of the main issues that prevent ILP systems from tackling many learning problems. These learning problems are interesting both from a research perspective, as they raise the standards for ILP systems, and from an application perspective, where these target concepts naturally occur in many real-world applications. Such complex concepts cannot be constructed or evaluated by parallelizing existing top-down ILP systems or improving the underlying Prolog engine. Novel search strategies and cover algorithms are needed. The main focus of this thesis is on how to efficiently construct and evaluate complex hypotheses in an ILP setting. In order to construct such hypotheses we investigate two approaches. The first, the Top Directed Hypothesis Derivation framework, implemented in the ILP system TopLog, involves the use of a top theory to constrain the hypothesis space. In the second approach we revisit the bottom-up search strategy of Golem, lifting its restriction on determinate clauses which had rendered Golem inapplicable to many key areas. These developments led to the bottom-up ILP system ProGolem. A challenge that arises with a bottom-up approach is the coverage computation of long, non-determinate, clauses. Prolog’s SLD-resolution is no longer adequate. We developed a new, Prolog-based, theta-subsumption engine which is significantly more efficient than SLD-resolution in computing the coverage of such complex clauses. We provide evidence that ProGolem achieves the goal of learning complex concepts by presenting a protein-hexose binding prediction application. The theory ProGolem induced has a statistically significant better predictive accuracy than that of other learners. More importantly, the biological insights ProGolem’s theory provided were judged by domain experts to be relevant and, in some cases, novel

    Medical WordNet: A new methodology for the construction and validation of information resources for consumer health

    Get PDF
    A consumer health information system must be able to comprehend both expert and non-expert medical vocabulary and to map between the two. We describe an ongoing project to create a new lexical database called Medical WordNet (MWN), consisting of medically relevant terms used by and intelligible to non-expert subjects and supplemented by a corpus of natural-language sentences that is designed to provide medically validated contexts for MWN terms. The corpus derives primarily from online health information sources targeted to consumers, and involves two sub-corpora, called Medical FactNet (MFN) and Medical BeliefNet (MBN), respectively. The former consists of statements accredited as true on the basis of a rigorous process of validation, the latter of statements which non-experts believe to be true. We summarize the MWN / MFN / MBN project, and describe some of its applications

    Decentralized Control and Adaptation in Distributed Applications via Web and Semantic Web Technologies

    Get PDF
    The presented work provides an approach and an implementation for enabling decentralized control in distributed applications composed of heterogeneous components by benefiting from the interoperability provided by the Web stack and relying on semantic technologies for enabling data integration. In particular, the concept of Smart Components enables adaptability at runtime through an adaptation layer and is complemented by a reference architecture as well as a prototypical implementation

    Web ontology reasoning with logic databases [online]

    Get PDF

    Knowledge Representation Concepts for Automated SLA Management

    Full text link
    Outsourcing of complex IT infrastructure to IT service providers has increased substantially during the past years. IT service providers must be able to fulfil their service-quality commitments based upon predefined Service Level Agreements (SLAs) with the service customer. They need to manage, execute and maintain thousands of SLAs for different customers and different types of services, which needs new levels of flexibility and automation not available with the current technology. The complexity of contractual logic in SLAs requires new forms of knowledge representation to automatically draw inferences and execute contractual agreements. A logic-based approach provides several advantages including automated rule chaining allowing for compact knowledge representation as well as flexibility to adapt to rapidly changing business requirements. We suggest adequate logical formalisms for representation and enforcement of SLA rules and describe a proof-of-concept implementation. The article describes selected formalisms of the ContractLog KR and their adequacy for automated SLA management and presents results of experiments to demonstrate flexibility and scalability of the approach.Comment: Paschke, A. and Bichler, M.: Knowledge Representation Concepts for Automated SLA Management, Int. Journal of Decision Support Systems (DSS), submitted 19th March 200
    corecore