1,738 research outputs found
The use of data-mining for the automatic formation of tactics
This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques
Graph Pattern Matching in GQL and SQL/PGQ
As graph databases become widespread, JTC1 -- the committee in joint charge
of information technology standards for the International Organization for
Standardization (ISO), and International Electrotechnical Commission (IEC) --
has approved a project to create GQL, a standard property graph query language.
This complements a project to extend SQL with a new part, SQL/PGQ, which
specifies how to define graph views over an SQL tabular schema, and to run
read-only queries against them.
Both projects have been assigned to the ISO/IEC JTC1 SC32 working group for
Database Languages, WG3, which continues to maintain and enhance SQL as a
whole. This common responsibility helps enforce a policy that the identical
core of both PGQ and GQL is a graph pattern matching sub-language, here termed
GPML.
The WG3 design process is also analyzed by an academic working group, part of
the Linked Data Benchmark Council (LDBC), whose task is to produce a formal
semantics of these graph data languages, which complements their standard
specifications.
This paper, written by members of WG3 and LDBC, presents the key elements of
the GPML of SQL/PGQ and GQL in advance of the publication of these new
standards
Provenance in Collaborative Data Sharing
This dissertation focuses on recording, maintaining and exploiting provenance information in Collaborative Data Sharing Systems (CDSS). These are systems that support data sharing across loosely-coupled, heterogeneous collections of relational databases related by declarative schema mappings. A fundamental challenge in a CDSS is to support the capability of update exchange --- which publishes a participant\u27s updates and then translates others\u27 updates to the participant\u27s local schema and imports them --- while tolerating disagreement between them and recording the provenance of exchanged data, i.e., information about the sources and mappings involved in their propagation. This provenance information can be useful during update exchange, e.g., to evaluate provenance-based trust policies. It can also be exploited after update exchange, to answer a variety of user queries, about the quality, uncertainty or authority of the data, for applications such as trust assessment, ranking for keyword search over databases, or query answering in probabilistic databases.
To address these challenges, in this dissertation we develop a novel model of provenance graphs that is informative enough to satisfy the needs of CDSS users and captures the semantics of query answering on various forms of annotated relations. We extend techniques from data integration, data exchange, incremental view maintenance and view update to define the formal semantics of unidirectional and bidirectional update exchange. We develop algorithms to perform update exchange incrementally while maintaining provenance information. We present strategies for implementing our techniques over an RDBMS and experimentally demonstrate their viability in the Orchestra prototype system. We define ProQL, a query language for provenance graphs that can be used by CDSS users to combine data querying with provenance testing as well as to compute annotations for their data, based on their provenance, that are useful for a variety of applications. Finally, we develop a prototype implementation ProQL over an RDBMS and indexing techniques to speed up provenance querying, evaluate experimentally the performance of provenance querying and the benefits of our indexing techniques
Query Answering in Probabilistic Data and Knowledge Bases
Probabilistic data and knowledge bases are becoming increasingly important in academia and industry. They are continuously extended with new data, powered by modern information extraction tools that associate probabilities with knowledge base facts. The state of the art to store and process such data is founded on probabilistic database systems, which are widely and successfully employed. Beyond all the success stories, however, such systems still lack the fundamental machinery to convey some of the valuable knowledge hidden in them to the end user, which limits their potential applications in practice. In particular, in their classical form, such systems are typically based on strong, unrealistic limitations, such as the closed-world assumption, the closed-domain assumption, the tuple-independence assumption, and the lack of commonsense knowledge. These limitations do not only lead to unwanted consequences, but also put such systems on weak footing in important tasks, querying answering being a very central one. In this thesis, we enhance probabilistic data and knowledge bases with more realistic data models, thereby allowing for better means for querying them. Building on the long endeavor of unifying logic and probability, we develop different rigorous semantics for probabilistic data and knowledge bases, analyze their computational properties and identify sources of (in)tractability and design practical scalable query answering algorithms whenever possible. To achieve this, the current work brings together some recent paradigms from logics, probabilistic inference, and database theory
Recommended from our members
Chord Sequence patterns in OWL
This thesis addresses the representation of and reasoning on musical knowledge in the Semantic Web. The Semantic Web is an evolving extension of the World Wide Web that aims at describing information that is distributed on the web in a machine-processable form. Existing approaches to modelling musical knowledge in the context of the Semantic Web have focused on metadata. The description of musical content and reasoning as well as integration of content descriptions and metadata are yet open challenges. This thesis discusses the possibilities of representing musical knowledge in the Web Ontology Language (OWL) focusing on chord sequence representation and presents and evaluates a newly developed solution.
The solution consists of two main components. Ontological modelling patterns for musical entities such as notes and chords are introduced in the (MEO) ontology. A sequence pattern language and ontology (SEQ) has been developed that can express patterns in a form resembling regular expressions. As MEO and SEQ patterns both rewrite to OWL they can be combined freely. Reasoning tasks such as instance classification, retrieval and pattern subsumption are then executable by standard Semantic Web reasoners. The expressiveness of SEQ has been studied, in particular in relation to grammars.
The complexity of reasoning on SEQ patterns has been studied theoretically and empirically, and optimisation methods have been developed. There is still great potential for improvement if specific reasoning algorithms were developed to exploit the sequential structure, but the development of such algorithms is outside the scope of this thesis.
MEO and SEQ have also been evaluated in several musicological scenarios. It is shown how patterns that are characteristic of musical styles can be expressed and chord sequence data can be classified, demonstrating the use of the language in web retrieval and as integration layer for different chord patterns and corpora. Furthermore, possibilities of using SEQ patterns for harmonic analysis are explored using grammars for harmony; both a hybrid system and a translation of limited context-free grammars into SEQ patterns have been developed. Finally, a distributed scenario is evaluated where SEQ and MEO are used in connection with DBpedia, following the Linked Data approach. The results show that applications are already possible and will benefit in the future from improved quality and compatibility of data sources as the Semantic Web evolves
Conjunctive Queries for Logic-Based Information Extraction
This thesis offers two logic-based approaches to conjunctive queries in the
context of information extraction. The first and main approach is the
introduction of conjunctive query fragments of the logics FC and FC[REG],
denoted as FC-CQ and FC[REG]-CQ respectively. FC is a first-order logic based
on word equations, where the semantics are defined by limiting the universe to
the factors of some finite input word. FC[REG] is FC extended with regular
constraints. The second approach is to consider the dynamic complexity of FC.Comment: Based on the author's PhD thesis and contains work from two
conference publications (arXiv:2104.04758, arXiv:1909.10869) which are joint
work with Dominik D. Freydenberge
- âŠ