Search CORE

27,259 research outputs found

Tight and simple Web graph compression

Author: Bieniecki Wojciech
Grabowski Szymon
Publication venue
Publication date: 01/01/2010
Field of study

Analysing Web graphs has applications in determining page ranks, fighting Web spam, detecting communities and mirror sites, and more. This study is however hampered by the necessity of storing a major part of huge graphs in the external memory, which prevents efficient random access to edge (hyperlink) lists. A number of algorithms involving compression techniques have thus been presented, to represent Web graphs succinctly but also providing random access. Those techniques are usually based on differential encodings of the adjacency lists, finding repeating nodes or node regions in the successive lists, more general grammar-based transformations or 2-dimensional representations of the binary matrix of the graph. In this paper we present two Web graph compression algorithms. The first can be seen as engineering of the Boldi and Vigna (2004) method. We extend the notion of similarity between link lists, and use a more compact encoding of residuals. The algorithm works on blocks of varying size (in the number of input lines) and sacrifices access time for better compression ratio, achieving more succinct graph representation than other algorithms reported in the literature. The second algorithm works on blocks of the same size, in the number of input lines, and its key mechanism is merging the block into a single ordered list. This method achieves much more attractive space-time tradeoffs.Comment: 15 page

arXiv.org e-Print Archive

CiteSeerX

TorusE: Knowledge Graph Embedding on a Lie Group

Author: Ebisu Takuma
Ichise Ryutaro
Publication venue
Publication date: 15/11/2017
Field of study

Knowledge graphs are useful for many artificial intelligence (AI) tasks. However, knowledge graphs often have missing facts. To populate the graphs, knowledge graph embedding models have been developed. Knowledge graph embedding models map entities and relations in a knowledge graph to a vector space and predict unknown triples by scoring candidate triples. TransE is the first translation-based method and it is well known because of its simplicity and efficiency for knowledge graph completion. It employs the principle that the differences between entity embeddings represent their relations. The principle seems very simple, but it can effectively capture the rules of a knowledge graph. However, TransE has a problem with its regularization. TransE forces entity embeddings to be on a sphere in the embedding vector space. This regularization warps the embeddings and makes it difficult for them to fulfill the abovementioned principle. The regularization also affects adversely the accuracies of the link predictions. On the other hand, regularization is important because entity embeddings diverge by negative sampling without it. This paper proposes a novel embedding model, TorusE, to solve the regularization problem. The principle of TransE can be defined on any Lie group. A torus, which is one of the compact Lie groups, can be chosen for the embedding space to avoid regularization. To the best of our knowledge, TorusE is the first model that embeds objects on other than a real or complex vector space, and this paper is the first to formally discuss the problem of regularization of TransE. Our approach outperforms other state-of-the-art approaches such as TransE, DistMult and ComplEx on a standard link prediction task. We show that TorusE is scalable to large-size knowledge graphs and is faster than the original TransE.Comment: accepted for AAAI-1

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Buildings, spiders, and geometric Satake

Author: Fontaine Bruce
Kamnitzer Joel
Kuperberg Greg
Publication venue
Publication date: 17/03/2011
Field of study

Let G be a simple algebraic group. Labelled trivalent graphs called webs can be used to product invariants in tensor products of minuscule representations. For each web, we construct a configuration space of points in the affine Grassmannian. Via the geometric Satake correspondence, we relate these configuration spaces to the invariant vectors coming from webs. In the case G = SL(3), non-elliptic webs yield a basis for the invariant spaces. The non-elliptic condition, which is equivalent to the condition that the dual diskoid of the web is CAT(0), is explained by the fact that affine buildings are CAT(0).Comment: 49 pages; revised and to appear in Compositio Mathematic

arXiv.org e-Print Archive

eScholarship - University of California