5,976 research outputs found
Word Embeddings for Entity-annotated Texts
Learned vector representations of words are useful tools for many information
retrieval and natural language processing tasks due to their ability to capture
lexical semantics. However, while many such tasks involve or even rely on named
entities as central components, popular word embedding models have so far
failed to include entities as first-class citizens. While it seems intuitive
that annotating named entities in the training corpus should result in more
intelligent word features for downstream tasks, performance issues arise when
popular embedding approaches are naively applied to entity annotated corpora.
Not only are the resulting entity embeddings less useful than expected, but one
also finds that the performance of the non-entity word embeddings degrades in
comparison to those trained on the raw, unannotated corpus. In this paper, we
investigate approaches to jointly train word and entity embeddings on a large
corpus with automatically annotated and linked entities. We discuss two
distinct approaches to the generation of such embeddings, namely the training
of state-of-the-art embeddings on raw-text and annotated versions of the
corpus, as well as node embeddings of a co-occurrence graph representation of
the annotated corpus. We compare the performance of annotated embeddings and
classical word embeddings on a variety of word similarity, analogy, and
clustering evaluation tasks, and investigate their performance in
entity-specific tasks. Our findings show that it takes more than training
popular word embedding models on an annotated corpus to create entity
embeddings with acceptable performance on common test cases. Based on these
results, we discuss how and when node embeddings of the co-occurrence graph
representation of the text can restore the performance.Comment: This paper is accepted in 41st European Conference on Information
Retrieva
Detecting highly overlapping community structure by greedy clique expansion
In complex networks it is common for each node to belong to several
communities, implying a highly overlapping community structure. Recent advances
in benchmarking indicate that existing community assignment algorithms that are
capable of detecting overlapping communities perform well only when the extent
of community overlap is kept to modest levels. To overcome this limitation, we
introduce a new community assignment algorithm called Greedy Clique Expansion
(GCE). The algorithm identifies distinct cliques as seeds and expands these
seeds by greedily optimizing a local fitness function. We perform extensive
benchmarks on synthetic data to demonstrate that GCE's good performance is
robust across diverse graph topologies. Significantly, GCE is the only
algorithm to perform well on these synthetic graphs, in which every node
belongs to multiple communities. Furthermore, when put to the task of
identifying functional modules in protein interaction data, and college dorm
assignments in Facebook friendship data, we find that GCE performs
competitively.Comment: 10 pages, 7 Figures. Implementation source and binaries available at
http://sites.google.com/site/greedycliqueexpansion
Optimal network topologies: Expanders, Cages, Ramanujan graphs, Entangled networks and all that
We report on some recent developments in the search for optimal network
topologies. First we review some basic concepts on spectral graph theory,
including adjacency and Laplacian matrices, and paying special attention to the
topological implications of having large spectral gaps. We also introduce
related concepts as ``expanders'', Ramanujan, and Cage graphs. Afterwards, we
discuss two different dynamical feautures of networks: synchronizability and
flow of random walkers and so that they are optimized if the corresponding
Laplacian matrix have a large spectral gap. From this, we show, by developing a
numerical optimization algorithm that maximum synchronizability and fast random
walk spreading are obtained for a particular type of extremely homogeneous
regular networks, with long loops and poor modular structure, that we call
entangled networks. These turn out to be related to Ramanujan and Cage graphs.
We argue also that these graphs are very good finite-size approximations to
Bethe lattices, and provide almost or almost optimal solutions to many other
problems as, for instance, searchability in the presence of congestion or
performance of neural networks. Finally, we study how these results are
modified when studying dynamical processes controlled by a normalized (weighted
and directed) dynamics; much more heterogeneous graphs are optimal in this
case. Finally, a critical discussion of the limitations and possible extensions
of this work is presented.Comment: 17 pages. 11 figures. Small corrections and a new reference. Accepted
for pub. in JSTA
On the Expansion of Group-Based Lifts
A -lift of an -vertex base graph is a graph on
vertices, where each vertex of is replaced by vertices
and each edge in is replaced by a matching
representing a bijection so that the edges of are of the form
. Lifts have been studied as a means to efficiently
construct expanders. In this work, we study lifts obtained from groups and
group actions. We derive the spectrum of such lifts via the representation
theory principles of the underlying group. Our main results are:
(1) There is a constant such that for every , there
does not exist an abelian -lift of any -vertex -regular base graph
with being almost Ramanujan (nontrivial eigenvalues of the adjacency matrix
at most in magnitude). This can be viewed as an analogue of the
well-known no-expansion result for abelian Cayley graphs.
(2) A uniform random lift in a cyclic group of order of any -vertex
-regular base graph , with the nontrivial eigenvalues of the adjacency
matrix of bounded by in magnitude, has the new nontrivial
eigenvalues also bounded by in magnitude with probability
. In particular, there is a constant such that for
every , there exists a lift of every Ramanujan graph in
a cyclic group of order with being almost Ramanujan. We use this to
design a quasi-polynomial time algorithm to construct almost Ramanujan
expanders deterministically.
The existence of expanding lifts in cyclic groups of order
can be viewed as a lower bound on the order of the largest abelian group
that produces expanding lifts. Our results show that the lower bound matches
the upper bound for (upto in the exponent)
Lifted Worm Algorithm for the Ising Model
We design an irreversible worm algorithm for the zero-field ferromagnetic
Ising model by using the lifting technique. We study the dynamic critical
behavior of an energy estimator on both the complete graph and toroidal grids,
and compare our findings with reversible algorithms such as the
Prokof'ev-Svistunov worm algorithm. Our results show that the lifted worm
algorithm improves the dynamic exponent of the energy estimator on the complete
graph, and leads to a significant constant improvement on toroidal grids.Comment: 9 pages, 6 figure
Fast Mixing of Parallel Glauber Dynamics and Low-Delay CSMA Scheduling
Glauber dynamics is a powerful tool to generate randomized, approximate
solutions to combinatorially difficult problems. It has been used to analyze
and design distributed CSMA (Carrier Sense Multiple Access) scheduling
algorithms for multi-hop wireless networks. In this paper we derive bounds on
the mixing time of a generalization of Glauber dynamics where multiple links
are allowed to update their states in parallel and the fugacity of each link
can be different. The results can be used to prove that the average queue
length (and hence, the delay) under the parallel Glauber dynamics based CSMA
grows polynomially in the number of links for wireless networks with
bounded-degree interference graphs when the arrival rate lies in a fraction of
the capacity region. We also show that in specific network topologies, the
low-delay capacity region can be further improved.Comment: 12 page
- …