Search CORE

5,976 research outputs found

Word Embeddings for Entity-annotated Texts

Author: A Das
A Spitz
CD Manning
D Nadeau
E Bruni
F Hill
F Hill
H Abdi
H Rubenstein
J Mitchell
J Strötgen
JG Moreno
L Maaten
P Bojanowski
P Goyal
S Deerwester
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/02/2020
Field of study

Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw-text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.Comment: This paper is accepted in 41st European Conference on Information Retrieva

arXiv.org e-Print Archive

Crossref

Detecting highly overlapping community structure by greedy clique expansion

Author: Hurley Neil
Lee Conrad
McDaid Aaron
Reid Fergal
Publication venue
Publication date: 01/01/2010
Field of study

In complex networks it is common for each node to belong to several communities, implying a highly overlapping community structure. Recent advances in benchmarking indicate that existing community assignment algorithms that are capable of detecting overlapping communities perform well only when the extent of community overlap is kept to modest levels. To overcome this limitation, we introduce a new community assignment algorithm called Greedy Clique Expansion (GCE). The algorithm identifies distinct cliques as seeds and expands these seeds by greedily optimizing a local fitness function. We perform extensive benchmarks on synthetic data to demonstrate that GCE's good performance is robust across diverse graph topologies. Significantly, GCE is the only algorithm to perform well on these synthetic graphs, in which every node belongs to multiple communities. Furthermore, when put to the task of identifying functional modules in protein interaction data, and college dorm assignments in Facebook friendship data, we find that GCE performs competitively.Comment: 10 pages, 7 Figures. Implementation source and binaries available at http://sites.google.com/site/greedycliqueexpansion

arXiv.org e-Print Archive

CiteSeerX

Research Repository UCD

Irish Universities

Optimal network topologies: Expanders, Cages, Ramanujan graphs, Entangled networks and all that

Author: Barthelemy M Flammini A
Bollobás B
Bollt E M
Chung F
Chung F
Davidoff G
Donetti L
Donetti L
Dorogovtsev S N
Franco Neri
Gastner M T Newman M E J
Kirkpatrick S
Lovász L
Luca Donetti
Margulis G A
Miguel A Muñoz
Mohar B
Myrvold W
Pastor Satorras R
Pons P Latapy M
Read R C
Reingold O
Sarnak P
Tutte W
Weisstein E W
Publication venue: 'IOP Publishing'
Publication date: 15/06/2006
Field of study

We report on some recent developments in the search for optimal network topologies. First we review some basic concepts on spectral graph theory, including adjacency and Laplacian matrices, and paying special attention to the topological implications of having large spectral gaps. We also introduce related concepts as ``expanders'', Ramanujan, and Cage graphs. Afterwards, we discuss two different dynamical feautures of networks: synchronizability and flow of random walkers and so that they are optimized if the corresponding Laplacian matrix have a large spectral gap. From this, we show, by developing a numerical optimization algorithm that maximum synchronizability and fast random walk spreading are obtained for a particular type of extremely homogeneous regular networks, with long loops and poor modular structure, that we call entangled networks. These turn out to be related to Ramanujan and Cage graphs. We argue also that these graphs are very good finite-size approximations to Bethe lattices, and provide almost or almost optimal solutions to many other problems as, for instance, searchability in the presence of congestion or performance of neural networks. Finally, we study how these results are modified when studying dynamical processes controlled by a normalized (weighted and directed) dynamics; much more heterogeneous graphs are optimal in this case. Finally, a critical discussion of the limitations and possible extensions of this work is presented.Comment: 17 pages. 11 figures. Small corrections and a new reference. Accepted for pub. in JSTA

arXiv.org e-Print Archive

Crossref

On the Expansion of Group-Based Lifts

Author: Agarwal Naman
Chandrasekaran Karthekeyan
Kolla Alexandra
Madan Vivek
Publication venue
Publication date: 17/12/2016
Field of study

k

-lift of an

n

-vertex base graph

G

is a graph

H

n\times k

vertices, where each vertex

v

G

is replaced by

k

vertices

v_1,\cdots{},v_k

and each edge

(u,v)

G

is replaced by a matching representing a bijection

\pi_{uv}

so that the edges of

H

are of the form

(u_i,v_{\pi_{uv}(i)})

. Lifts have been studied as a means to efficiently construct expanders. In this work, we study lifts obtained from groups and group actions. We derive the spectrum of such lifts via the representation theory principles of the underlying group. Our main results are: (1) There is a constant

c_1

such that for every

k\geq 2^{c_1nd}

, there does not exist an abelian

k

-lift

H

of any

n

-vertex

d

-regular base graph with

H

being almost Ramanujan (nontrivial eigenvalues of the adjacency matrix at most

O(\sqrt{d})

in magnitude). This can be viewed as an analogue of the well-known no-expansion result for abelian Cayley graphs. (2) A uniform random lift in a cyclic group of order

k

of any

n

-vertex

d

-regular base graph

G

, with the nontrivial eigenvalues of the adjacency matrix of

G

bounded by

\lambda

in magnitude, has the new nontrivial eigenvalues also bounded by

\lambda+O(\sqrt{d})

in magnitude with probability

1-ke^{-\Omega(n/d^2)}

. In particular, there is a constant

c_2

such that for every

k\leq 2^{c_2n/d^2}

, there exists a lift

H

of every Ramanujan graph in a cyclic group of order

k

with

H

being almost Ramanujan. We use this to design a quasi-polynomial time algorithm to construct almost Ramanujan expanders deterministically. The existence of expanding lifts in cyclic groups of order

k=2^{O(n/d^2)}

can be viewed as a lower bound on the order

k_0

of the largest abelian group that produces expanding lifts. Our results show that the lower bound matches the upper bound for

k_0

(upto

d^3

in the exponent)

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Lifted Worm Algorithm for the Ising Model

Author: Deng Youjin
Ding Lijie
Elçi Eren Metin
Garoni Timothy M.
Grimm Jens
Nasrawi Abrahim
Publication venue: 'American Physical Society (APS)'
Publication date: 14/11/2017
Field of study

We design an irreversible worm algorithm for the zero-field ferromagnetic Ising model by using the lifting technique. We study the dynamic critical behavior of an energy estimator on both the complete graph and toroidal grids, and compare our findings with reversible algorithms such as the Prokof'ev-Svistunov worm algorithm. Our results show that the lifted worm algorithm improves the dynamic exponent of the energy estimator on the complete graph, and leads to a significant constant improvement on toroidal grids.Comment: 9 pages, 6 figure

arXiv.org e-Print Archive

Monash University Research Portal

Fast Mixing of Parallel Glauber Dynamics and Low-Delay CSMA Scheduling

Author: Jiang Libin
Leconte Mathieu
Ni Jian
Srikant R.
Walrand Jean
Publication venue
Publication date: 02/08/2010
Field of study

Glauber dynamics is a powerful tool to generate randomized, approximate solutions to combinatorially difficult problems. It has been used to analyze and design distributed CSMA (Carrier Sense Multiple Access) scheduling algorithms for multi-hop wireless networks. In this paper we derive bounds on the mixing time of a generalization of Glauber dynamics where multiple links are allowed to update their states in parallel and the fugacity of each link can be different. The results can be used to prove that the average queue length (and hence, the delay) under the parallel Glauber dynamics based CSMA grows polynomially in the number of links for wireless networks with bounded-degree interference graphs when the arrival rate lies in a fraction of the capacity region. We also show that in specific network topologies, the low-delay capacity region can be further improved.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Caltech Authors