4 research outputs found
Structural Deep Embedding for Hyper-Networks
Network embedding has recently attracted lots of attentions in data mining.
Existing network embedding methods mainly focus on networks with pairwise
relationships. In real world, however, the relationships among data points
could go beyond pairwise, i.e., three or more objects are involved in each
relationship represented by a hyperedge, thus forming hyper-networks. These
hyper-networks pose great challenges to existing network embedding methods when
the hyperedges are indecomposable, that is to say, any subset of nodes in a
hyperedge cannot form another hyperedge. These indecomposable hyperedges are
especially common in heterogeneous networks. In this paper, we propose a novel
Deep Hyper-Network Embedding (DHNE) model to embed hyper-networks with
indecomposable hyperedges. More specifically, we theoretically prove that any
linear similarity metric in embedding space commonly used in existing methods
cannot maintain the indecomposibility property in hyper-networks, and thus
propose a new deep model to realize a non-linear tuplewise similarity function
while preserving both local and global proximities in the formed embedding
space. We conduct extensive experiments on four different types of
hyper-networks, including a GPS network, an online social network, a drug
network and a semantic network. The empirical results demonstrate that our
method can significantly and consistently outperform the state-of-the-art
algorithms.Comment: Accepted by AAAI 1
Advanced Academic Team Worker Recommendation Models
Collaborator recommendation is an important task in academic domain. Most of
the existing approaches have the assumption that the recommendation system only
need to recommend a specific researcher for the task. However, academic
successes can be owed to productive collaboration of a whole academic team. In
this work, we propose a new task: academic team worker recommendation: with a
given status: student, assistant professor or prime professor, research
interests and specific task, we can recommend an academic team formed as (prime
professor, assistant professor, student). For this task, we propose a model
CQBG-R(Citation-Query Blended Graph-Ranking). The key ideas is to combine the
context of the query and the papers with the graph topology to form a new
graph(CQBG), which can target at the research interests and the specific
research task for this time. The experiment results show the effectiveness of
the proposed method
Development of a text mining approach to disease network discovery
Scientific literature is one of the major sources of knowledge for systems biology, in the form of papers, patents and other types of written reports. Text mining methods aim at automatically extracting relevant information from the literature. The hypothesis of this thesis was that biological systems could be elucidated by the development of text mining solutions that can automatically extract relevant information from documents. The first objective consisted in developing software components to recognize biomedical entities in text, which is the first step to generate a network about a biological system. To this end, a machine learning solution was developed, which can be trained for specific biological entities using an annotated dataset, obtaining high-quality results. Additionally, a rule-based solution was developed, which can be easily adapted to various types of entities.
The second objective consisted in developing an automatic approach to link the recognized entities to a reference knowledge base. A solution based on the PageRank algorithm was developed in order to match the entities to the concepts that most contribute to the overall coherence.
The third objective consisted in automatically extracting relations between entities, to generate knowledge graphs about biological systems. Due to the lack of annotated datasets available for this task, distant supervision was employed to train a relation classifier on a corpus of documents and a knowledge base. The applicability of this approach was demonstrated in two case studies: microRNAgene relations for cystic fibrosis, obtaining a network of 27 relations using the abstracts of 51 recently published papers; and cell-cytokine relations for tolerogenic cell therapies, obtaining a network of 647 relations from 3264 abstracts.
Through a manual evaluation, the information contained in these networks was determined to be relevant. Additionally, a solution combining deep learning techniques with ontology information was developed, to take advantage of the domain knowledge provided by ontologies.
This thesis contributed with several solutions that demonstrate the usefulness of text mining methods to systems biology by extracting domain-specific information from the literature. These solutions make it easier to integrate various areas of research, leading to a better understanding of biological systems