4 research outputs found
Dropout Training of Matrix Factorization and Autoencoder for Link Prediction in Sparse Graphs
Matrix factorization (MF) and Autoencoder (AE) are among the most successful
approaches of unsupervised learning. While MF based models have been
extensively exploited in the graph modeling and link prediction literature, the
AE family has not gained much attention. In this paper we investigate both MF
and AE's application to the link prediction problem in sparse graphs. We show
the connection between AE and MF from the perspective of multiview learning,
and further propose MF+AE: a model training MF and AE jointly with shared
parameters. We apply dropout to training both the MF and AE parts, and show
that it can significantly prevent overfitting by acting as an adaptive
regularization. We conduct experiments on six real world sparse graph datasets,
and show that MF+AE consistently outperforms the competing methods, especially
on datasets that demonstrate strong non-cohesive structures.Comment: Published in SDM 201
Predicting multicellular function through multi-layer tissue networks
Motivation: Understanding functions of proteins in specific human tissues is
essential for insights into disease diagnostics and therapeutics, yet
prediction of tissue-specific cellular function remains a critical challenge
for biomedicine.
Results: Here we present OhmNet, a hierarchy-aware unsupervised node feature
learning approach for multi-layer networks. We build a multi-layer network,
where each layer represents molecular interactions in a different human tissue.
OhmNet then automatically learns a mapping of proteins, represented as nodes,
to a neural embedding based low-dimensional space of features. OhmNet
encourages sharing of similar features among proteins with similar network
neighborhoods and among proteins activated in similar tissues. The algorithm
generalizes prior work, which generally ignores relationships between tissues,
by modeling tissue organization with a rich multiscale tissue hierarchy. We use
OhmNet to study multicellular function in a multi-layer protein interaction
network of 107 human tissues. In 48 tissues with known tissue-specific cellular
functions, OhmNet provides more accurate predictions of cellular function than
alternative approaches, and also generates more accurate hypotheses about
tissue-specific protein actions. We show that taking into account the tissue
hierarchy leads to improved predictive power. Remarkably, we also demonstrate
that it is possible to leverage the tissue hierarchy in order to effectively
transfer cellular functions to a functionally uncharacterized tissue. Overall,
OhmNet moves from flat networks to multiscale models able to predict a range of
phenotypes spanning cellular subsystemsComment: In Proceedings of the 25th International Conference on Intelligent
Systems for Molecular Biology (ISMB), 201
node2vec: Scalable Feature Learning for Networks
Prediction tasks over nodes and edges in networks require careful effort in
engineering features used by learning algorithms. Recent research in the
broader field of representation learning has led to significant progress in
automating prediction by learning the features themselves. However, present
feature learning approaches are not expressive enough to capture the diversity
of connectivity patterns observed in networks. Here we propose node2vec, an
algorithmic framework for learning continuous feature representations for nodes
in networks. In node2vec, we learn a mapping of nodes to a low-dimensional
space of features that maximizes the likelihood of preserving network
neighborhoods of nodes. We define a flexible notion of a node's network
neighborhood and design a biased random walk procedure, which efficiently
explores diverse neighborhoods. Our algorithm generalizes prior work which is
based on rigid notions of network neighborhoods, and we argue that the added
flexibility in exploring neighborhoods is the key to learning richer
representations. We demonstrate the efficacy of node2vec over existing
state-of-the-art techniques on multi-label classification and link prediction
in several real-world networks from diverse domains. Taken together, our work
represents a new way for efficiently learning state-of-the-art task-independent
representations in complex networks.Comment: In Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, 201
On the Implicit Bias of Dropout
Algorithmic approaches endow deep learning systems with implicit bias that
helps them generalize even in over-parametrized settings. In this paper, we
focus on understanding such a bias induced in learning through dropout, a
popular technique to avoid overfitting in deep learning. For single
hidden-layer linear neural networks, we show that dropout tends to make the
norm of incoming/outgoing weight vectors of all the hidden nodes equal. In
addition, we provide a complete characterization of the optimization landscape
induced by dropout.Comment: 17 pages, 3 figures, In Proceedings of the Thirty-fifth International
Conference on Machine Learning (ICML), 201