721 research outputs found
Bayesian Locality Sensitive Hashing for Fast Similarity Search
Given a collection of objects and an associated similarity measure, the
all-pairs similarity search problem asks us to find all pairs of objects with
similarity greater than a certain user-specified threshold. Locality-sensitive
hashing (LSH) based methods have become a very popular approach for this
problem. However, most such methods only use LSH for the first phase of
similarity search - i.e. efficient indexing for candidate generation. In this
paper, we present BayesLSH, a principled Bayesian algorithm for the subsequent
phase of similarity search - performing candidate pruning and similarity
estimation using LSH. A simpler variant, BayesLSH-Lite, which calculates
similarities exactly, is also presented. BayesLSH is able to quickly prune away
a large majority of the false positive candidate pairs, leading to significant
speedups over baseline approaches. For BayesLSH, we also provide probabilistic
guarantees on the quality of the output, both in terms of accuracy and recall.
Finally, the quality of BayesLSH's output can be easily tuned and does not
require any manual setting of the number of hashes to use for similarity
estimation, unlike standard approaches. For two state-of-the-art candidate
generation algorithms, AllPairs and LSH, BayesLSH enables significant speedups,
typically in the range 2x-20x for a wide variety of datasets.Comment: 13 pages, 5 Tables, 21 figures. Added acknowledgments in v3. A
slightly shorter version of this paper without the appendix has been
published in the PVLDB journal, 5(5):430-441, 2012.
http://vldb.org/pvldb/vol5/p430_venusatuluri_vldb2012.pd
QDEE: Question Difficulty and Expertise Estimation in Community Question Answering Sites
In this paper, we present a framework for Question Difficulty and Expertise
Estimation (QDEE) in Community Question Answering sites (CQAs) such as Yahoo!
Answers and Stack Overflow, which tackles a fundamental challenge in
crowdsourcing: how to appropriately route and assign questions to users with
the suitable expertise. This problem domain has been the subject of much
research and includes both language-agnostic as well as language conscious
solutions. We bring to bear a key language-agnostic insight: that users gain
expertise and therefore tend to ask as well as answer more difficult questions
over time. We use this insight within the popular competition (directed) graph
model to estimate question difficulty and user expertise by identifying key
hierarchical structure within said model. An important and novel contribution
here is the application of "social agony" to this problem domain. Difficulty
levels of newly posted questions (the cold-start problem) are estimated by
using our QDEE framework and additional textual features. We also propose a
model to route newly posted questions to appropriate users based on the
difficulty level of the question and the expertise of the user. Extensive
experiments on real world CQAs such as Yahoo! Answers and Stack Overflow data
demonstrate the improved efficacy of our approach over contemporary
state-of-the-art models. The QDEE framework also allows us to characterize user
expertise in novel ways by identifying interesting patterns and roles played by
different users in such CQAs.Comment: Accepted in the Proceedings of the 12th International AAAI Conference
on Web and Social Media (ICWSM 2018). June 2018. Stanford, CA, US
Hierarchical Change Point Detection on Dynamic Networks
This paper studies change point detection on networks with community
structures. It proposes a framework that can detect both local and global
changes in networks efficiently. Importantly, it can clearly distinguish the
two types of changes. The framework design is generic and as such several
state-of-the-art change point detection algorithms can fit in this design.
Experiments on both synthetic and real-world networks show that this framework
can accurately detect changes while achieving up to 800X speedup.Comment: 9 pages, ACM WebSci'1
On the relation between the WRT invariant and the Hennings invariant
The purpose of this note is to provide a simple relation between the
Witten-Reshetikhin-Turaev SO(3) invariant and the Hennings invariant of
3-manifolds associated to quantum sl_2.Comment: 14 pages, 1 figur
Semi-supervised Embedding in Attributed Networks with Outliers
In this paper, we propose a novel framework, called Semi-supervised Embedding
in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector
representation that systematically captures the topological proximity,
attribute affinity and label similarity of vertices in a partially labeled
attributed network (PLAN). Our method is designed to work in both transductive
and inductive settings while explicitly alleviating noise effects from
outliers. Experimental results on various datasets drawn from the web, text and
image domains demonstrate the advantages of SEANO over state-of-the-art methods
in semi-supervised classification under transductive as well as inductive
settings. We also show that a subset of parameters in SEANO is interpretable as
outlier score and can significantly outperform baseline methods when applied
for detecting network outliers. Finally, we present the use of SEANO in a
challenging real-world setting -- flood mapping of satellite images and show
that it is able to outperform modern remote sensing algorithms for this task.Comment: in Proceedings of SIAM International Conference on Data Mining
(SDM'18
- …
