5,976 research outputs found

    Word Embeddings for Entity-annotated Texts

    Full text link
    Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw-text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.Comment: This paper is accepted in 41st European Conference on Information Retrieva

    Detecting highly overlapping community structure by greedy clique expansion

    Get PDF
    In complex networks it is common for each node to belong to several communities, implying a highly overlapping community structure. Recent advances in benchmarking indicate that existing community assignment algorithms that are capable of detecting overlapping communities perform well only when the extent of community overlap is kept to modest levels. To overcome this limitation, we introduce a new community assignment algorithm called Greedy Clique Expansion (GCE). The algorithm identifies distinct cliques as seeds and expands these seeds by greedily optimizing a local fitness function. We perform extensive benchmarks on synthetic data to demonstrate that GCE's good performance is robust across diverse graph topologies. Significantly, GCE is the only algorithm to perform well on these synthetic graphs, in which every node belongs to multiple communities. Furthermore, when put to the task of identifying functional modules in protein interaction data, and college dorm assignments in Facebook friendship data, we find that GCE performs competitively.Comment: 10 pages, 7 Figures. Implementation source and binaries available at http://sites.google.com/site/greedycliqueexpansion

    Optimal network topologies: Expanders, Cages, Ramanujan graphs, Entangled networks and all that

    Full text link
    We report on some recent developments in the search for optimal network topologies. First we review some basic concepts on spectral graph theory, including adjacency and Laplacian matrices, and paying special attention to the topological implications of having large spectral gaps. We also introduce related concepts as ``expanders'', Ramanujan, and Cage graphs. Afterwards, we discuss two different dynamical feautures of networks: synchronizability and flow of random walkers and so that they are optimized if the corresponding Laplacian matrix have a large spectral gap. From this, we show, by developing a numerical optimization algorithm that maximum synchronizability and fast random walk spreading are obtained for a particular type of extremely homogeneous regular networks, with long loops and poor modular structure, that we call entangled networks. These turn out to be related to Ramanujan and Cage graphs. We argue also that these graphs are very good finite-size approximations to Bethe lattices, and provide almost or almost optimal solutions to many other problems as, for instance, searchability in the presence of congestion or performance of neural networks. Finally, we study how these results are modified when studying dynamical processes controlled by a normalized (weighted and directed) dynamics; much more heterogeneous graphs are optimal in this case. Finally, a critical discussion of the limitations and possible extensions of this work is presented.Comment: 17 pages. 11 figures. Small corrections and a new reference. Accepted for pub. in JSTA

    On the Expansion of Group-Based Lifts

    Get PDF
    A kk-lift of an nn-vertex base graph GG is a graph HH on n×kn\times k vertices, where each vertex vv of GG is replaced by kk vertices v1,,vkv_1,\cdots{},v_k and each edge (u,v)(u,v) in GG is replaced by a matching representing a bijection πuv\pi_{uv} so that the edges of HH are of the form (ui,vπuv(i))(u_i,v_{\pi_{uv}(i)}). Lifts have been studied as a means to efficiently construct expanders. In this work, we study lifts obtained from groups and group actions. We derive the spectrum of such lifts via the representation theory principles of the underlying group. Our main results are: (1) There is a constant c1c_1 such that for every k2c1ndk\geq 2^{c_1nd}, there does not exist an abelian kk-lift HH of any nn-vertex dd-regular base graph with HH being almost Ramanujan (nontrivial eigenvalues of the adjacency matrix at most O(d)O(\sqrt{d}) in magnitude). This can be viewed as an analogue of the well-known no-expansion result for abelian Cayley graphs. (2) A uniform random lift in a cyclic group of order kk of any nn-vertex dd-regular base graph GG, with the nontrivial eigenvalues of the adjacency matrix of GG bounded by λ\lambda in magnitude, has the new nontrivial eigenvalues also bounded by λ+O(d)\lambda+O(\sqrt{d}) in magnitude with probability 1keΩ(n/d2)1-ke^{-\Omega(n/d^2)}. In particular, there is a constant c2c_2 such that for every k2c2n/d2k\leq 2^{c_2n/d^2}, there exists a lift HH of every Ramanujan graph in a cyclic group of order kk with HH being almost Ramanujan. We use this to design a quasi-polynomial time algorithm to construct almost Ramanujan expanders deterministically. The existence of expanding lifts in cyclic groups of order k=2O(n/d2)k=2^{O(n/d^2)} can be viewed as a lower bound on the order k0k_0 of the largest abelian group that produces expanding lifts. Our results show that the lower bound matches the upper bound for k0k_0 (upto d3d^3 in the exponent)

    Lifted Worm Algorithm for the Ising Model

    Full text link
    We design an irreversible worm algorithm for the zero-field ferromagnetic Ising model by using the lifting technique. We study the dynamic critical behavior of an energy estimator on both the complete graph and toroidal grids, and compare our findings with reversible algorithms such as the Prokof'ev-Svistunov worm algorithm. Our results show that the lifted worm algorithm improves the dynamic exponent of the energy estimator on the complete graph, and leads to a significant constant improvement on toroidal grids.Comment: 9 pages, 6 figure

    Fast Mixing of Parallel Glauber Dynamics and Low-Delay CSMA Scheduling

    Full text link
    Glauber dynamics is a powerful tool to generate randomized, approximate solutions to combinatorially difficult problems. It has been used to analyze and design distributed CSMA (Carrier Sense Multiple Access) scheduling algorithms for multi-hop wireless networks. In this paper we derive bounds on the mixing time of a generalization of Glauber dynamics where multiple links are allowed to update their states in parallel and the fugacity of each link can be different. The results can be used to prove that the average queue length (and hence, the delay) under the parallel Glauber dynamics based CSMA grows polynomially in the number of links for wireless networks with bounded-degree interference graphs when the arrival rate lies in a fraction of the capacity region. We also show that in specific network topologies, the low-delay capacity region can be further improved.Comment: 12 page
    corecore