449 research outputs found
Niffler: A Reference Architecture and System Implementation for View Discovery over Pathless Table Collections by Example
Identifying a project-join view (PJ-view) over collections of tables is the
first step of many data management projects, e.g., assembling a dataset to feed
into a business intelligence tool, creating a training dataset to fit a machine
learning model, and more. When the table collections are large and lack join
information--such as when combining databases, or on data lakes--query by
example (QBE) systems can help identify relevant data, but they are designed
under the assumption that join information is available in the schema, and do
not perform well on pathless table collections that do not have join path
information.
We present a reference architecture that explicitly divides the end-to-end
problem of discovering PJ-views over pathless table collections into a human
and a technical problem. We then present Niffler, a system built to address the
technical problem. We introduce algorithms for the main components of Niffler,
including a signal generation component that helps reduce the size of the
candidate views that may be large due to errors and ambiguity in both the data
and input queries. We evaluate Niffler on real datasets to demonstrate the
effectiveness of the new engine in discovering PJ-views over pathless table
collections
Graph-Query Suggestions for Knowledge Graph Exploration
We consider the task of exploratory search through graph queries on knowledge graphs. We propose to assist the user by expanding the query with intuitive suggestions to provide a more informative (full) query that can retrieve more detailed and relevant answers. To achieve this result, we propose a model that can bridge graph search paradigms with well-established techniques for information-retrieval. Our approach does not require any additional knowledge from the user and builds on principled language modelling approaches. We empirically show the effectiveness and efficiency of our approach on a large knowledge graph and how our suggestions are able to help build more complete and informative queries
Recommended from our members
Exploiting a perdurantist foundational ontology and graph database for semantic data integration
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.The view of reality that is inherent to perdurantist philosophical ontologies, often termed four dimensional (4D) ontologies, has not been widely adopted within the mainstream of information system design practice. However, as the closed world of enterprise systems is opened to Internet scale Semantic Web and Open Data information sources, there is a need to better understand the semantics of both internal and external data and how they can be integrated. Philosophical foundational ontologies can help establish this understanding and there is, therefore, an emerging need to research how they can be applied to the problem of semantic data integration. Therefore, a prime objective of this research was to develop a framework through which to apply a 4D foundational ontology and a graph database to the problem of semantic data integration, and to assess the effectiveness of the approach. The research employed design science, a methodology which is applicable to undertaking research within information systems as it encompasses methods through which the research can be undertaken and the resultant artefacts evaluated. This methodology has a number of discrete stages: problem awareness; a core design-build-evaluate iterative cycle through which the research is conducted; and a conclusion stage. The design science research was conducted through the development of a number of artefacts, the prime being the 4D-Semantic Extract Load (4D-SETL) framework. The effectiveness of the framework was assessed by applying it to semantically interpret and integrate a number of large scale datasets and to instantiate a prototype graph database warehouse to persist the resultant ontology. A series of technical experiments confirmed that directly reflecting the model patterns of 4D ontology within a prototype data warehouse proved an effective means of both structuring and semantically integrating complex datasets and that the artefacts produced by 4D-SETL could function at scale. Through illustrative scenario, the effectiveness of the approach is described in relation to the ability of the framework to address a number of weaknesses in current approaches. Furthermore the major advantages of the 4D-SETL are elaborated; which include ability of the framework is to combine foundational, domain and instance level ontological models in a single coherent system that dispensed with much of the translation normally undertaken between conceptual, logical and physical data models. Additionally, adopting a perdurantist realist foundational ontology provided a clear means of establishing and maintaining the identity of physical objects as their constituent temporal and spatial parts unfold over the course of tim
Proof-Pattern Recognition and Lemma Discovery in ACL2
We present a novel technique for combining statistical machine learning for
proof-pattern recognition with symbolic methods for lemma discovery. The
resulting tool, ACL2(ml), gathers proof statistics and uses statistical
pattern-recognition to pre-processes data from libraries, and then suggests
auxiliary lemmas in new proofs by analogy with already seen examples. This
paper presents the implementation of ACL2(ml) alongside theoretical
descriptions of the proof-pattern recognition and lemma discovery methods
involved in it
EvoAlloy: An Evolutionary Approach For Analyzing Alloy Specifications
Using mathematical notations and logical reasoning, formal methods precisely define a program’s specifications, from which we can instantiate valid instances of a system. With these techniques, we can perform a variety of analysis tasks to verify system dependability and rigorously prove the correctness of system properties. While there exist well-designed automated verification tools including ones considered lightweight, they still lack a strong adoption in practice. The essence of the problem is that when applied to large real world applications, they are not scalable and applicable due to the expense of thorough verification process. In this thesis, I present a new approach and demonstrate how to relax the completeness guarantee without much loss, since soundness is maintained. I have extended a widely applied lightweight analysis, Alloy, with a genetic algorithm. Our new tool, EvoAlloy, works at the level of finite relations generated by Kodkod and evolves the chromosomes based on the feedback including failed constraints. Through a feasibility study, I prove that my approach can successfully find solutions to a set of specifications beyond the scope where traditional Alloy Analyzer fails. While EvoAlloy solves small size problems with longer time, its scalability provided by genetic extension shows its potential to handle larger specifications. My future vision is that when specifications are small I can maintain both soundness and completeness, but when this fails, EvoAlloy can switch to its genetic algorithm.
Adviser: Hamid Bagher
Reverseorc:Reverse engineering of resizable user interface layouts with or-constraints
Reverse engineering (RE) of user interfaces (UIs) plays an important role in
software evolution. However, the large diversity of UI technologies and the
need for UIs to be resizable make this challenging. We propose ReverseORC, a
novel RE approach able to discover diverse layout types and their dynamic
resizing behaviours independently of their implementation, and to specify them
by using OR constraints. Unlike previous RE approaches, ReverseORC infers
flexible layout constraint specifications by sampling UIs at different sizes
and analyzing the differences between them. It can create specifications that
replicate even some non-standard layout managers with complex dynamic layout
behaviours. We demonstrate that ReverseORC works across different platforms
with very different layout approaches, e.g., for GUIs as well as for the Web.
Furthermore, it can be used to detect and fix problems in legacy UIs, extend
UIs with enhanced layout behaviours, and support the creation of flexible UI
layouts.Comment: CHI2021 Full Pape
- …