4,063 research outputs found
A Factor-Adjusted Multiple Testing Procedure with Application to Mutual Fund Selection
In this article, we propose a factor-adjusted multiple testing (FAT)
procedure based on factor-adjusted p-values in a linear factor model involving
some observable and unobservable factors, for the purpose of selecting skilled
funds in empirical finance. The factor-adjusted p-values were obtained after
extracting the latent common factors by the principal component method. Under
some mild conditions, the false discovery proportion can be consistently
estimated even if the idiosyncratic errors are allowed to be weakly correlated
across units. Furthermore, by appropriately setting a sequence of threshold
values approaching zero, the proposed FAT procedure enjoys model selection
consistency. Extensive simulation studies and a real data analysis for
selecting skilled funds in the U.S. financial market are presented to
illustrate the practical utility of the proposed method. Supplementary
materials for this article are available online
Leveraging Node Attributes for Incomplete Relational Data
Relational data are usually highly incomplete in practice, which inspires us
to leverage side information to improve the performance of community detection
and link prediction. This paper presents a Bayesian probabilistic approach that
incorporates various kinds of node attributes encoded in binary form in
relational models with Poisson likelihood. Our method works flexibly with both
directed and undirected relational networks. The inference can be done by
efficient Gibbs sampling which leverages sparsity of both networks and node
attributes. Extensive experiments show that our models achieve the
state-of-the-art link prediction results, especially with highly incomplete
relational data.Comment: Appearing in ICML 201
Dirichlet belief networks for topic structure learning
Recently, considerable research effort has been devoted to developing deep
architectures for topic models to learn topic structures. Although several deep
models have been proposed to learn better topic proportions of documents, how
to leverage the benefits of deep structures for learning word distributions of
topics has not yet been rigorously studied. Here we propose a new multi-layer
generative process on word distributions of topics, where each layer consists
of a set of topics and each topic is drawn from a mixture of the topics of the
layer above. As the topics in all layers can be directly interpreted by words,
the proposed model is able to discover interpretable topic hierarchies. As a
self-contained module, our model can be flexibly adapted to different kinds of
topic models to improve their modelling accuracy and interpretability.
Extensive experiments on text corpora demonstrate the advantages of the
proposed model.Comment: accepted in NIPS 201
MetaLDA: a Topic Model that Efficiently Incorporates Meta information
Besides the text content, documents and their associated words usually come
with rich sets of meta informa- tion, such as categories of documents and
semantic/syntactic features of words, like those encoded in word embeddings.
Incorporating such meta information directly into the generative process of
topic models can improve modelling accuracy and topic quality, especially in
the case where the word-occurrence information in the training data is
insufficient. In this paper, we present a topic model, called MetaLDA, which is
able to leverage either document or word meta information, or both of them
jointly. With two data argumentation techniques, we can derive an efficient
Gibbs sampling algorithm, which benefits from the fully local conjugacy of the
model. Moreover, the algorithm is favoured by the sparsity of the meta
information. Extensive experiments on several real world datasets demonstrate
that our model achieves comparable or improved performance in terms of both
perplexity and topic quality, particularly in handling sparse texts. In
addition, compared with other models using meta information, our model runs
significantly faster.Comment: To appear in ICDM 201
- …