35,983 research outputs found
On the Complexity and Approximation of Binary Evidence in Lifted Inference
Lifted inference algorithms exploit symmetries in probabilistic models to
speed up inference. They show impressive performance when calculating
unconditional probabilities in relational models, but often resort to
non-lifted inference when computing conditional probabilities. The reason is
that conditioning on evidence breaks many of the model's symmetries, which can
preempt standard lifting techniques. Recent theoretical results show, for
example, that conditioning on evidence which corresponds to binary relations is
#P-hard, suggesting that no lifting is to be expected in the worst case. In
this paper, we balance this negative result by identifying the Boolean rank of
the evidence as a key parameter for characterizing the complexity of
conditioning in lifted inference. In particular, we show that conditioning on
binary evidence with bounded Boolean rank is efficient. This opens up the
possibility of approximating evidence by a low-rank Boolean matrix
factorization, which we investigate both theoretically and empirically.Comment: To appear in Advances in Neural Information Processing Systems 26
(NIPS), Lake Tahoe, USA, December 201
Negative Statements Considered Useful
Knowledge bases (KBs), pragmatic collections of knowledge about notable entities, are an important asset in applications such as search, question answering and dialogue. Rooted in a long tradition in knowledge representation, all popular KBs only store positive information, while they abstain from taking any stance towards statements not contained in them. In this paper, we make the case for explicitly stating interesting statements which are not true. Negative statements would be important to overcome current limitations of question answering, yet due to their potential abundance, any effort towards compiling them needs a tight coupling with ranking. We introduce two approaches towards compiling negative statements. (i) In peer-based statistical inferences, we compare entities with highly related entities in order to derive potential negative statements, which we then rank using supervised and unsupervised features. (ii) In query-log-based text extraction, we use a pattern-based approach for harvesting search engine query logs. Experimental results show that both approaches hold promising and complementary potential. Along with this paper, we publish the first datasets on interesting negative information, containing over 1.1M statements for 100K popular Wikidata entities
Synthetic sequence generator for recommender systems - memory biased random walk on sequence multilayer network
Personalized recommender systems rely on each user's personal usage data in
the system, in order to assist in decision making. However, privacy policies
protecting users' rights prevent these highly personal data from being publicly
available to a wider researcher audience. In this work, we propose a memory
biased random walk model on multilayer sequence network, as a generator of
synthetic sequential data for recommender systems. We demonstrate the
applicability of the synthetic data in training recommender system models for
cases when privacy policies restrict clickstream publishing.Comment: The new updated version of the pape
People on Drugs: Credibility of User Statements in Health Communities
Online health communities are a valuable source of information for patients
and physicians. However, such user-generated resources are often plagued by
inaccuracies and misinformation. In this work we propose a method for
automatically establishing the credibility of user-generated medical statements
and the trustworthiness of their authors by exploiting linguistic cues and
distant supervision from expert sources. To this end we introduce a
probabilistic graphical model that jointly learns user trustworthiness,
statement credibility, and language objectivity. We apply this methodology to
the task of extracting rare or unknown side-effects of medical drugs --- this
being one of the problems where large scale non-expert data has the potential
to complement expert medical knowledge. We show that our method can reliably
extract side-effects and filter out false statements, while identifying
trustworthy users that are likely to contribute valuable medical information
- …