75 research outputs found

### Oblivious Bounds on the Probability of Boolean Functions

This paper develops upper and lower bounds for the probability of Boolean
functions by treating multiple occurrences of variables as independent and
assigning them new individual probabilities. We call this approach dissociation
and give an exact characterization of optimal oblivious bounds, i.e. when the
new probabilities are chosen independent of the probabilities of all other
variables. Our motivation comes from the weighted model counting problem (or,
equivalently, the problem of computing the probability of a Boolean function),
which is #P-hard in general. By performing several dissociations, one can
transform a Boolean formula whose probability is difficult to compute, into one
whose probability is easy to compute, and which is guaranteed to provide an
upper or lower bound on the probability of the original formula by choosing
appropriate probabilities for the dissociated variables. Our new bounds shed
light on the connection between previous relaxation-based and model-based
approximations and unify them as concrete choices in a larger design space. We
also show how our theory allows a standard relational database management
system (DBMS) to both upper and lower bound hard probabilistic queries in
guaranteed polynomial time.Comment: 34 pages, 14 figures, supersedes: http://arxiv.org/abs/1105.281

### Rules of Thumb for Information Acquisition from Large and Redundant Data

We develop an abstract model of information acquisition from redundant data.
We assume a random sampling process from data which provide information with
bias and are interested in the fraction of information we expect to learn as
function of (i) the sampled fraction (recall) and (ii) varying bias of
information (redundancy distributions). We develop two rules of thumb with
varying robustness. We first show that, when information bias follows a Zipf
distribution, the 80-20 rule or Pareto principle does surprisingly not hold,
and we rather expect to learn less than 40% of the information when randomly
sampling 20% of the overall data. We then analytically prove that for large
data sets, randomized sampling from power-law distributions leads to "truncated
distributions" with the same power-law exponent. This second rule is very
robust and also holds for distributions that deviate substantially from a
strict power law. We further give one particular family of powerlaw functions
that remain completely invariant under sampling. Finally, we validate our model
with two large Web data sets: link distributions to domains and tag
distributions on delicious.com.Comment: 40 pages, 17 figures; for details see the project page:
http://uniquerecall.co

### A Tutorial on Visual Representations of Relational Queries

Query formulation is increasingly performed by systems that need to guess a
user's intent (e.g. via spoken word interfaces). But how can a user know that
the computational agent is returning answers to the "right" query? More
generally, given that relational queries can become pretty complicated, how can
we help users understand existing relational queries, whether human-generated
or automatically generated? Now seems the right moment to revisit a topic that
predates the birth of the relational model: developing visual metaphors that
help users understand relational queries.
This lecture-style tutorial surveys the key visual metaphors developed for
visual representations of relational expressions. We will survey the history
and state-of-the art of relationally-complete diagrammatic representations of
relational queries, discuss the key visual metaphors developed in over a
century of investigating diagrammatic languages, and organize the landscape by
mapping their used visual alphabets to the syntax and semantics of Relational
Algebra (RA) and Relational Calculus (RC).Comment: 4 page tutorial paper at VLDB 2023, tutorial web page with slides to
be posted in time:
https://northeastern-datalab.github.io/visual-query-representation-tutorial/.
arXiv admin note: text overlap with arXiv:2208.0161

### A General Framework for Anytime Approximation in Probabilistic Databases

Anytime approximation algorithms that compute the probabilities of queries
over probabilistic databases can be of great use to statistical learning tasks.
Those approaches have been based so far on either (i) sampling or (ii)
branch-and-bound with model-based bounds. We present here a more general
branch-and-bound framework that extends the possible bounds by using
'dissociation', which yields tighter bounds.Comment: 3 pages, 2 figures, submitted to StarAI 2018 Worksho

### Believe It or Not: Adding Belief Annotations to Databases

We propose a database model that allows users to annotate data with belief
statements. Our motivation comes from scientific database applications where a
community of users is working together to assemble, revise, and curate a shared
data repository. As the community accumulates knowledge and the database
content evolves over time, it may contain conflicting information and members
can disagree on the information it should store. For example, Alice may believe
that a tuple should be in the database, whereas Bob disagrees. He may also
insert the reason why he thinks Alice believes the tuple should be in the
database, and explain what he thinks the correct tuple should be instead.
We propose a formal model for Belief Databases that interprets users'
annotations as belief statements. These annotations can refer both to the base
data and to other annotations. We give a formal semantics based on a fragment
of multi-agent epistemic logic and define a query language over belief
databases. We then prove a key technical result, stating that every belief
database can be encoded as a canonical Kripke structure. We use this structure
to describe a relational representation of belief databases, and give an
algorithm for translating queries over the belief database into standard
relational queries. Finally, we report early experimental results with our
prototype implementation on synthetic data.Comment: 17 pages, 10 figure

- …