Search CORE

1,256 research outputs found

Efficient Regularized Least-Squares Algorithms for Conditional Ranking on Relational Data

Author: Airola Antti
De Baets Bernard
Pahikkala Tapio
Stock Michiel
Waegeman Willem
Publication venue
Publication date: 01/01/2013
Field of study

In domains like bioinformatics, information retrieval and social network analysis, one can find learning tasks where the goal consists of inferring a ranking of objects, conditioned on a particular target object. We present a general kernel framework for learning conditional rankings from various types of relational data, where rankings can be conditioned on unseen data objects. We propose efficient algorithms for conditional ranking by optimizing squared regression and ranking loss functions. We show theoretically, that learning with the ranking loss is likely to generalize better than with the regression loss. Further, we prove that symmetry or reciprocity properties of relations can be efficiently enforced in the learned models. Experiments on synthetic and real-world data illustrate that the proposed methods deliver state-of-the-art performance in terms of predictive power and computational efficiency. Moreover, we also show empirically that incorporating symmetry or reciprocity properties can improve the generalization performance

arXiv.org e-Print Archive

CiteSeerX

edge2vec: Representation learning using edge semantics for biomedical knowledge discovery

Author: Ding Ying
Foote Brian
Fu Gang
Gao Zheng
Gessner Christopher
Liu Xiaozhong
Ouyang Chunping
Tsutsui Satoshi
Wild David
Yang Jeremy
Yu Qi
Publication venue
Publication date: 27/05/2019
Field of study

Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, \textbf{edge2vec}\ significantly outperforms state-of-the-art models on all three tasks. We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.Comment: 10 page

arXiv.org e-Print Archive

IUScholarWorks Open

Large-scale Machine Learning in High-dimensional Datasets

Author: Hansen Toke Jansen
Publication venue: Technical University of Denmark
Publication date: 01/01/2013
Field of study