Inference enables an agent to create new knowledge from old or discover implicit
relationships between concepts in a knowledge base (KB), provided that appropriate
techniques are employed to deal with ambiguous, incomplete and sometimes erroneous
data.
The ever-increasing volumes of KBs on the web, available for use by automated
systems, present an opportunity to leverage the available knowledge in order to improve
the inference process in automated query answering systems. This thesis focuses
on the FRANK (Functional Reasoning for Acquiring Novel Knowledge) framework
that responds to queries where no suitable answer is readily contained in any available
data source, using a variety of inference operations.
Most question answering and information retrieval systems assume that answers
to queries are stored in some form in the KB, thereby limiting the range of answers
they can find. We take an approach motivated by rich forms of inference using techniques,
such as regression, for prediction. For instance, FRANK can answer “what
country in Europe will have the largest population in 2021?" by decomposing Europe
geo-spatially, using regression on country population for past years and selecting the
country with the largest predicted value. Our technique, which we refer to as Rich
Inference, combines heuristics, logic and statistical methods to infer novel answers
to queries. It also determines what facts are needed for inference, searches for them,
and then integrates the diverse facts and their formalisms into a local query-specific
inference tree.
Our primary contribution in this thesis is the inference algorithm on which FRANK
works. This includes (1) the process of recursively decomposing queries in way that
allows variables in the query to be instantiated by facts in KBs; (2) the use of aggregate
functions to perform arithmetic and statistical operations (e.g. prediction) to infer new
values from child nodes; and (3) the estimation and propagation of uncertainty values
into the returned answer based on errors introduced by noise in the KBs or errors
introduced by aggregate functions.
We also discuss many of the core concepts and modules that constitute FRANK.
We explain the internal “alist” representation of FRANK that gives it the required
flexibility to tackle different kinds of problems with minimal changes to its internal
representation. We discuss the grammar for a simple query language that allows users
to express queries in a formal way, such that we avoid the complexities of natural
language queries, a problem that falls outside the scope of this thesis. We evaluate the
framework with datasets from open sources