4 research outputs found

    edge2vec: Representation learning using edge semantics for biomedical knowledge discovery

    Full text link
    Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, \textbf{edge2vec}\ significantly outperforms state-of-the-art models on all three tasks. We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.Comment: 10 page

    Learning Heterogeneous Network Embedding From Text and Links

    Get PDF
    Finding methods to represent multiple types of nodes in heterogeneous networks is both challenging and rewarding, as there is much less work in this area compared with that of homogeneous networks. In this paper, we propose a novel approach to learn node embedding for heterogeneous networks through a joint learning framework of both network links and text associated with nodes. A novel attention mechanism is also used to make good use of text extended through links to obtain much larger network context. Link embedding is first learned through a random-walk-based method to process multiple types of links. Text embedding is separately learned at both sentence level and document level to capture salient semantic information more comprehensively. Then, both types of embeddings are jointly fed into a hierarchical neural network model to learn node representation through mutual enhancement. The attention mechanism follows linked edges to obtain context of adjacent nodes to extend context for node representation. The evaluation on a link prediction task in a heterogeneous network data set shows that our method outperforms the current state-of-the-art method by 2.5%-5.0% in AUC values with p-value less than 10 -9 , indicating very significant improvement

    Representation Learning for Natural Language Processing

    Get PDF
    This open access book provides an overview of the recent advances in representation learning theory, algorithms and applications for natural language processing (NLP). It is divided into three parts. Part I presents the representation learning techniques for multiple language entries, including words, phrases, sentences and documents. Part II then introduces the representation techniques for those objects that are closely related to NLP, including entity-based world knowledge, sememe-based linguistic knowledge, networks, and cross-modal entries. Lastly, Part III provides open resource tools for representation learning techniques, and discusses the remaining challenges and future research directions. The theories and algorithms of representation learning presented can also benefit other related domains such as machine learning, social network analysis, semantic Web, information retrieval, data mining and computational biology. This book is intended for advanced undergraduate and graduate students, post-doctoral fellows, researchers, lecturers, and industrial engineers, as well as anyone interested in representation learning and natural language processing
    corecore