66,905 research outputs found
Leveraging Node Attributes for Incomplete Relational Data
Relational data are usually highly incomplete in practice, which inspires us
to leverage side information to improve the performance of community detection
and link prediction. This paper presents a Bayesian probabilistic approach that
incorporates various kinds of node attributes encoded in binary form in
relational models with Poisson likelihood. Our method works flexibly with both
directed and undirected relational networks. The inference can be done by
efficient Gibbs sampling which leverages sparsity of both networks and node
attributes. Extensive experiments show that our models achieve the
state-of-the-art link prediction results, especially with highly incomplete
relational data.Comment: Appearing in ICML 201
Automated data pre-processing via meta-learning
The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around.
As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed.
We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning.
Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result
of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version
- …