835 research outputs found
Supervised Typing of Big Graphs using Semantic Embeddings
We propose a supervised algorithm for generating type embeddings in the same
semantic vector space as a given set of entity embeddings. The algorithm is
agnostic to the derivation of the underlying entity embeddings. It does not
require any manual feature engineering, generalizes well to hundreds of types
and achieves near-linear scaling on Big Graphs containing many millions of
triples and instances by virtue of an incremental execution. We demonstrate the
utility of the embeddings on a type recommendation task, outperforming a
non-parametric feature-agnostic baseline while achieving 15x speedup and
near-constant memory usage on a full partition of DBpedia. Using
state-of-the-art visualization, we illustrate the agreement of our
extensionally derived DBpedia type embeddings with the manually curated domain
ontology. Finally, we use the embeddings to probabilistically cluster about 4
million DBpedia instances into 415 types in the DBpedia ontology.Comment: 6 pages, to be published in Semantic Big Data Workshop at ACM, SIGMOD
2017; extended version in preparation for Open Journal of Semantic Web (OJSW
Network Embedding Learning in Knowledge Graph
University of Technology Sydney. Faculty of Engineering and Information Technology.Knowledge Graph stores a large number of human knowledge facts in form of multi-relational network structure, is widely used as a core technique in real-world applications including search engine, question answering system, and recommender system. Knowledge Graph is used to provide extra info box for user query in Google search engine, the WolframAlpha site provides question answering service relying on Knowledge Graph, and the eBay uses Knowledge Graph as semantic enhance for their recommendation service.
Motivated by several characteristics of Knowledge Graph including incompleteness, structural inferability, and semantical application enhancement, a few efforts have been put into the Knowledge Graph analysis area. Some works contribute to Knowledge Graph construction and maintenance through crowdsourcing. Some previous network embedding learning models show good performance on homogeneous network analysis, while the performance of directly using them on Knowledge Graph is limited because the multiple relationship information of the Knowledge Graph is ignored. Then, the concept of Knowledge Graph embedding learning is given, by learning representation for Knowledge Graph components including entities and relations, the latent semantic information is extracted into embedding representation. And the embedding techniques are also utilized in collaborative learning for Knowledge Graph and external application scenarios, the target is to use Knowledge Graph as a semantic enhancement to improve the performance of external applications.
However, some problems still remain in Knowledge Graph completion, reasoning, and external application. First, a proper model is required for Knowledge Graph self-completion, and a proper integration solution is also required to add extra conceptual taxonomy information into the process of Knowledge Graph completion. Then, a framework to use sub-structure information of Knowledge Graph network into knowledge reasoning is needed. After that, a collaborative learning framework for knowledge graph completion and downstream machine learning tasks is needed to be designed. In this thesis, we take recommender systems as an example of downstream machine learning tasks.
To address the aforementioned research problems, a few approaches are proposed in the works introduced in this thesis.
• A bipartite graph embedding based Knowledge Graph completion approach for Knowledge Graph self-completion, each knowledge fact is represented in the form of bipartite graph structure for more reasonable triple inference.
• An embedding based cross completion approach for completing the factual Knowledge Graph with additive conceptual taxonomy information, the components of factual Knowledge Graph and conceptual taxonomy, entities, relations, types, are jointly represented by embedding representation.
• Two sub-structure based Knowledge Graph transitive relation embedding approaches for knowledge reasoning analysis based on Knowledge Graph sub-structure, the transitive structural information contained in Knowledge Graph network substructure is learned into relation embedding.
• Two hierarchical collaborative embedding approaches for proper collaborative learning on Knowledge Graph and Recommender System through linking Knowledge Graph entities with Recommender items, then entities, relations, items, and users are represented by embedding in collaborative space.
The main contributions of this thesis are proposing a few approaches which can be used in multiple Knowledge Graph related domains, Knowledge Graph completion, reasoning and application. Two approaches achieve more accurate Knowledge Graph completion, other two approaches model knowledge reasoning based on network substructure analysis, and the other approaches apply Knowledge Graph into a recommender system application
Crosslingual Document Embedding as Reduced-Rank Ridge Regression
There has recently been much interest in extending vector-based word
representations to multiple languages, such that words can be compared across
languages. In this paper, we shift the focus from words to documents and
introduce a method for embedding documents written in any language into a
single, language-independent vector space. For training, our approach leverages
a multilingual corpus where the same concept is covered in multiple languages
(but not necessarily via exact translations), such as Wikipedia. Our method,
Cr5 (Crosslingual reduced-rank ridge regression), starts by training a
ridge-regression-based classifier that uses language-specific bag-of-word
features in order to predict the concept that a given document is about. We
show that, when constraining the learned weight matrix to be of low rank, it
can be factored to obtain the desired mappings from language-specific
bags-of-words to language-independent embeddings. As opposed to most prior
methods, which use pretrained monolingual word vectors, postprocess them to
make them crosslingual, and finally average word vectors to obtain document
vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as
document-level. Moreover, since our algorithm uses the singular value
decomposition as its core operation, it is highly scalable. Experiments show
that our method achieves state-of-the-art performance on a crosslingual
document retrieval task. Finally, although not trained for embedding sentences
and words, it also achieves competitive performance on crosslingual sentence
and word retrieval tasks.Comment: In The Twelfth ACM International Conference on Web Search and Data
Mining (WSDM '19
Recommended from our members
GRAPH REPRESENTATION LEARNING WITH BOX EMBEDDINGS
Graphs are ubiquitous data structures, present in many machine-learning tasks, such as link prediction of products and node classification of scientific papers. As gradient descent drives the training of most modern machine learning architectures, the ability to encode graph-structured data using a differentiable representation is essential to make use of this data. Most approaches encode graph structure in Euclidean space, however, it is non-trivial to model directed edges. The naive solution is to represent each node using a separate source and target vector, however, this can decouple the representation, making it harder for the model to capture information within longer paths in the graph.
In this dissertation, we propose to model graphs by representing each node as a \textit{box} (a Cartesian product of intervals) where directed edges are captured by the relative containment of one box in another. Theoretical proof shows that our proposed box embeddings have the expressiveness to represent any \emph{directed acyclic graph}. We also perform rigorous empirical evaluations of vector, hyperbolic, and region-based geometric representations on several families of synthetic and real-world directed graphs. Extensive experimental results suggest that the box containment can allow for transitive relationships to be modeled easily. We further propose t-Box, a variant of box embeddings that learns the temperature together during training. t-Box uses a learned smoothing parameter to achieve better representational capacity than vector models in low dimensions, while also avoiding performance saturation common to other geometric models in high dimensions.
Though promising, modeling directed graphs that both contain cycles and some element of transitivity, two properties common in real-world settings, is challenging. Box embeddings, which can be thought of as representing the graph as an intersection over some learned super-graphs, have a natural inductive bias toward modeling transitivity, but (as we prove) cannot model cycles. To address this issue, we propose binary code box embeddings, where a learned binary code selects a subset of graphs for intersection. We explore several variants, including global binary codes (amounting to a union over intersections) and per-vertex binary codes (allowing greater flexibility) as well as methods of regularization. Theoretical and empirical results show that the proposed models not only preserve a useful inductive bias of transitivity but also have sufficient representational capacity to model arbitrary graphs, including graphs with cycles.
Lastly, we discuss the use case where box embeddings are not free parameters but are produced by functions. In particular, we explore whether neural networks can map node features into the box space. This is critical in many real-world scenarios. On the one hand, graphs are sparse and the majority of vertices only have few connections or are completely isolated. On the other hand, there may exist rich node features such as attributes and descriptions, that could be useful for prediction tasks. The experimental analysis points out both the effectiveness and insufficiency of multi-layer perceptron-based encoders under different circumstances
Evaluating the Effectiveness of Margin Parameter when Learning Knowledge Embedding Representation for Domain-specific Multi-relational Categorized Data
Learning knowledge representation is an increasingly important technology that supports a variety of machine learning related applications. However, the choice of hyperparameters is seldom justified and usually relies on exhaustive search. Understanding the effect of hyperparameter combinations on embedding quality is crucial to avoid the inefficient process and enhance practicality of vector representation methods. We evaluate the effects of distinct values for the margin parameter focused on translational embedding representation models for multi-relational categorized data. We assess the margin influence regarding the quality of embedding models by contrasting traditional link prediction task accuracy against a classification task. The findings provide evidence that lower values of margin are not rigorous enough to help with the learning process, whereas larger values produce much noise pushing the entities beyond to the surface of the hyperspace, thus requiring constant regularization. Finally, the correlation between link prediction and classification accuracy shows traditional validation protocol for embedding models is a weak metric to represent the quality of embedding representation
Novel Perspectives and Applications of Knowledge Graph Embeddings: From Link Prediction to Risk Assessment and Explainability
Knowledge graph representation is an important embedding technology that supports a variety of machine learning related applications. By learning the distributed representation of multi-relational data, knowledge embedding models are supposed to efficiently deal with the semantic relatedness of their constituents. However, failing in the fundamental task of creating an appropriate form to represent knowledge harms any attempt of designing subsequent machine learning tasks. Several knowledge embedding methods have been proposed in the last decade. Although there is a consensus on the idea that enhanced approaches are more efficient, more complex projections in the hyperspace that indeed favor link prediction (or knowledge graph completion) can result in a loss of semantic similarity. We propose a new evaluation task that aims at performing risk assessment on domain-specific categorized multi-relational datasets, designed as a classification problem based on the resulting embeddings. We assess the quality of embedding representations based on the synergy of the resulting clusters of target subjects. We show that more sophisticated embedding approaches do not necessarily favor embedding quality, and the traditional link prediction validation protocol is a weak metric to measure the quality of embedding representation. Finally, we present insights about using the synergy analysis to provide risk assessment explainability based on the probability distribution of feature-value pairs within embedded clusters
Aspects of the topological dynamics of sparse graph automorphism groups
We examine sparse graph automorphism groups from the perspective
of the Kechris-Pestov-Todorčević (KPT) correspondence. The sparse
graphs that we discuss are Hrushovski constructions: we consider the
'ab initio’ Hrushovski construction M_0, the Fraïssé limit of the class of 2-sparse graphs with self-sufficient closure; M_1, a simplified version
of M_0; and the ω-categorical Hrushovski construction M_F. We prove
a series of results that show that the automorphism groups of these
Hrushovski constructions demonstrate very different behaviour to previous classes studied in the KPT context. Extending results of Evans,
Hubička and Nešetřil, we show that Aut(M_0) has no coprecompact
amenable subgroup. We investigate the fixed points on type spaces
property, a weakening of extreme amenability, and show that for a
particular choice of control function F, Aut(M_F) does not have any
closed oligomorphic subgroup with this property. Next we consider the
Aut(M_1)-flow of linear orders on M_1, and show that minimal subflows
of this have all Aut(M_1)-orbits meagre. We give partial analogous results for the Aut(M_0)-flow of linear orders on M_0, and find the universal
minimal flow of the automorphism group of the “dimension 0” part of
M_0.Open Acces
Investigating the temporal dynamics of inter-organizational exchange: patient transfers among Italian hospitals
Previous research on interaction behavior among organizations (resource exchange, collaboration, communication) has typically aggregated records of those behaviors over time to constitute a ‘network’ of organizational relationships. We instead directly study structural-temporal patterns in organizational exchange, focusing on the dynamics of reciprocation. Applying this lens to a community of Italian hospitals during the period 2003-2007, we observe two mechanisms of interorganizational reciprocation: organizational embedding and resource dependence. We flesh out these two mechanisms by showing how they operate in distinct time frames: Dependence operates on contemporaneous exchange structures, whereas embedding develops through longer-term historical patterns. We also show how these processes operate differently in competitive and noncompetitive contexts, operationalized in terms of market differentiation and geographic space. In noncompetitive contexts, we observe both logics of reciprocation, dependence in the short term and embedding over the long term, developing into patterns of generalized exchange in this population. In competitive contexts, we observe neither form of reciprocation and instead observe the microfoundations of status hierarchies in exchange
- …