9,749 research outputs found
Neural-Symbolic Entangled Framework for Complex Query Answering
Answering complex queries over knowledge graphs (KG) is an important yet
challenging task because of the KG incompleteness issue and cascading errors
during reasoning. Recent query embedding (QE) approaches to embed the entities
and relations in a KG and the first-order logic (FOL) queries into a low
dimensional space, answering queries by dense similarity search. However,
previous works mainly concentrate on the target answers, ignoring intermediate
entities' usefulness, which is essential for relieving the cascading error
problem in logical query answering. In addition, these methods are usually
designed with their own geometric or distributional embeddings to handle
logical operators like union, intersection, and negation, with the sacrifice of
the accuracy of the basic operator - projection, and they could not absorb
other embedding methods to their models. In this work, we propose a Neural and
Symbolic Entangled framework (ENeSy) for complex query answering, which enables
the neural and symbolic reasoning to enhance each other to alleviate the
cascading error and KG incompleteness. The projection operator in ENeSy could
be any embedding method with the capability of link prediction, and the other
FOL operators are handled without parameters. With both neural and symbolic
reasoning results contained, ENeSy answers queries in ensembles. ENeSy achieves
the SOTA performance on several benchmarks, especially in the setting of the
training model only with the link prediction task.Comment: Paper accepted by NeurIPS202
Differentially Private Publication of Sparse Data
The problem of privately releasing data is to provide a version of a dataset
without revealing sensitive information about the individuals who contribute to
the data. The model of differential privacy allows such private release while
providing strong guarantees on the output. A basic mechanism achieves
differential privacy by adding noise to the frequency counts in the contingency
tables (or, a subset of the count data cube) derived from the dataset. However,
when the dataset is sparse in its underlying space, as is the case for most
multi-attribute relations, then the effect of adding noise is to vastly
increase the size of the published data: it implicitly creates a huge number of
dummy data points to mask the true data, making it almost impossible to work
with.
We present techniques to overcome this roadblock and allow efficient private
release of sparse data, while maintaining the guarantees of differential
privacy. Our approach is to release a compact summary of the noisy data.
Generating the noisy data and then summarizing it would still be very costly,
so we show how to shortcut this step, and instead directly generate the summary
from the input data, without materializing the vast intermediate noisy data. We
instantiate this outline for a variety of sampling and filtering methods, and
show how to use the resulting summary for approximate, private, query
answering. Our experimental study shows that this is an effective, practical
solution, with comparable and occasionally improved utility over the costly
materialization approach
Constellation Queries over Big Data
A geometrical pattern is a set of points with all pairwise distances (or,
more generally, relative distances) specified. Finding matches to such patterns
has applications to spatial data in seismic, astronomical, and transportation
contexts. For example, a particularly interesting geometric pattern in
astronomy is the Einstein cross, which is an astronomical phenomenon in which a
single quasar is observed as four distinct sky objects (due to gravitational
lensing) when captured by earth telescopes. Finding such crosses, as well as
other geometric patterns, is a challenging problem as the potential number of
sets of elements that compose shapes is exponentially large in the size of the
dataset and the pattern. In this paper, we denote geometric patterns as
constellation queries and propose algorithms to find them in large data
applications. Our methods combine quadtrees, matrix multiplication, and
unindexed join processing to discover sets of points that match a geometric
pattern within some additive factor on the pairwise distances. Our distributed
experiments show that the choice of composition algorithm (matrix
multiplication or nested loops) depends on the freedom introduced in the query
geometry through the distance additive factor. Three clearly identified blocks
of threshold values guide the choice of the best composition algorithm.
Finally, solving the problem for relative distances requires a novel
continuous-to-discrete transformation. To the best of our knowledge this paper
is the first to investigate constellation queries at scale
- …