8,806 research outputs found
Leveraging Node Attributes for Incomplete Relational Data
Relational data are usually highly incomplete in practice, which inspires us
to leverage side information to improve the performance of community detection
and link prediction. This paper presents a Bayesian probabilistic approach that
incorporates various kinds of node attributes encoded in binary form in
relational models with Poisson likelihood. Our method works flexibly with both
directed and undirected relational networks. The inference can be done by
efficient Gibbs sampling which leverages sparsity of both networks and node
attributes. Extensive experiments show that our models achieve the
state-of-the-art link prediction results, especially with highly incomplete
relational data.Comment: Appearing in ICML 201
A Continuously Growing Dataset of Sentential Paraphrases
A major challenge in paraphrase research is the lack of parallel corpora. In
this paper, we present a new method to collect large-scale sentential
paraphrases from Twitter by linking tweets through shared URLs. The main
advantage of our method is its simplicity, as it gets rid of the classifier or
human in the loop needed to select data before annotation and subsequent
application of paraphrase identification algorithms in the previous work. We
present the largest human-labeled paraphrase corpus to date of 51,524 sentence
pairs and the first cross-domain benchmarking for automatic paraphrase
identification. In addition, we show that more than 30,000 new sentential
paraphrases can be easily and continuously captured every month at ~70%
precision, and demonstrate their utility for downstream NLP tasks through
phrasal paraphrase extraction. We make our code and data freely available.Comment: 11 pages, accepted to EMNLP 201
Dirichlet belief networks for topic structure learning
Recently, considerable research effort has been devoted to developing deep
architectures for topic models to learn topic structures. Although several deep
models have been proposed to learn better topic proportions of documents, how
to leverage the benefits of deep structures for learning word distributions of
topics has not yet been rigorously studied. Here we propose a new multi-layer
generative process on word distributions of topics, where each layer consists
of a set of topics and each topic is drawn from a mixture of the topics of the
layer above. As the topics in all layers can be directly interpreted by words,
the proposed model is able to discover interpretable topic hierarchies. As a
self-contained module, our model can be flexibly adapted to different kinds of
topic models to improve their modelling accuracy and interpretability.
Extensive experiments on text corpora demonstrate the advantages of the
proposed model.Comment: accepted in NIPS 201
MetaLDA: a Topic Model that Efficiently Incorporates Meta information
Besides the text content, documents and their associated words usually come
with rich sets of meta informa- tion, such as categories of documents and
semantic/syntactic features of words, like those encoded in word embeddings.
Incorporating such meta information directly into the generative process of
topic models can improve modelling accuracy and topic quality, especially in
the case where the word-occurrence information in the training data is
insufficient. In this paper, we present a topic model, called MetaLDA, which is
able to leverage either document or word meta information, or both of them
jointly. With two data argumentation techniques, we can derive an efficient
Gibbs sampling algorithm, which benefits from the fully local conjugacy of the
model. Moreover, the algorithm is favoured by the sparsity of the meta
information. Extensive experiments on several real world datasets demonstrate
that our model achieves comparable or improved performance in terms of both
perplexity and topic quality, particularly in handling sparse texts. In
addition, compared with other models using meta information, our model runs
significantly faster.Comment: To appear in ICDM 201
Scatterer induced mode splitting in poly(dimethylsiloxane) coated microresonators
We investigate scatterer induced mode splitting in a composite microtoroidal
resonator (Q ~ 10^6) fabricated by coating a silica microtoroid (Q ~ 10^7) with
a thin poly(dimethylsiloxane) layer. We show that the two split modes in both
coated and uncoated silica microtoroids respond in the same way to the changes
in the environmental temperature. This provides a self-referencing scheme which
is robust to temperature perturbations. Together with the versatile
functionalities of polymer materials, mode splitting in polymer and polymer
coated microresonators offers an attractive sensing platform that is robust to
thermal noise.Comment: 9 pages, 3 figures, 15 reference
- …