    Structural Inference of Hierarchies in Networks

    One property of networks that has received comparatively little attention is hierarchy, i.e., the property of having vertices that cluster together in groups, which then join to form groups of groups, and so forth, up through all levels of organization in the network. Here, we give a precise definition of hierarchical structure, give a generic model for generating arbitrary hierarchical structure in a random graph, and describe a statistically principled way to learn the set of hierarchical features that most plausibly explain a particular real-world network. By applying this approach to two example networks, we demonstrate its advantages for the interpretation of network data, the annotation of graphs with edge, vertex and community properties, and the generation of generic null models for further hypothesis testing.Comment: 8 pages, 8 figure

    Siamese hierarchical attention networks for extractive summarization

    [EN] In this paper, we present an extractive approach to document summarization based on Siamese Neural Networks. Specifically, we propose the use of Hierarchical Attention Networks to select the most relevant sentences of a text to make its summary. We train Siamese Neural Networks using document-summary pairs to determine whether the summary is appropriated for the document or not. By means of a sentence-level attention mechanism the most relevant sentences in the document can be identified. Hence, once the network is trained, it can be used to generate extractive summaries. The experimentation carried out using the CNN/DailyMail summarization corpus shows the adequacy of the proposal. In summary, we propose a novel end-to-end neural network to address extractive summarization as a binary classification problem which obtains promising results in-line with the state-of-the-art on the CNN/DailyMail corpus.This work has been partially supported by the Spanish MINECO and FEDER founds under project AMIC (TIN2017-85854-C4-2-R). Work of Jose-Angel Gonzalez is also financed by Universitat Politecnica de Valencia under grant PAID-01-17.González-Barba, JÁ.; Segarra Soriano, E.; García-Granada, F.; Sanchís Arnal, E.; Hurtado Oliver, LF. (2019). Siamese hierarchical attention networks for extractive summarization. Journal of Intelligent & Fuzzy Systems. 36(5):4599-4607. https://doi.org/10.3233/JIFS-179011S45994607365N. Begum , M. Fattah , and F. Ren . Automatic text summarization using support vector machine 5(7) (2009), 1987–1996.J. Cheng and M. Lapata . Neural summarization by extracting sentences and words. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, 2016.K.M. Hermann , T. Kocisky , E. Grefenstette , L. Espeholt , W. Kay , M. Suleyman , and P. Blunsom . Teaching machines to read and comprehend, CoRR, abs/1506.03340, 2015.D.P. Kingma and J. Ba . Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.Lloret, E., & Palomar, M. (2011). Text summarisation in progress: a literature review. Artificial Intelligence Review, 37(1), 1-41. doi:10.1007/s10462-011-9216-zLouis, A., & Nenkova, A. (2013). Automatically Assessing Machine Summary Content Without a Gold Standard. Computational Linguistics, 39(2), 267-300. doi:10.1162/coli_a_00123Miao, Y., & Blunsom, P. (2016). Language as a Latent Variable: Discrete Generative Models for Sentence Compression. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. doi:10.18653/v1/d16-1031R. Mihalcea and P. Tarau . Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 2004.T. Mikolov , K. Chen , G. S. Corrado , and J. Dean . Efficient estimation of word representations in vector space, CoRR, abs/1301.3781, 2013.Minaee, S., & Liu, Z. (2017). Automatic question-answering using a deep similarity neural network. 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP). doi:10.1109/globalsip.2017.8309095R. Paulus , C. Xiong , and R. Socher , A deep reinforced model for abstractive summarization. CoRR, abs/1705.04304, 2017.Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673-2681. doi:10.1109/78.650093See, A., Liu, P. J., & Manning, C. D. (2017). Get To The Point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). doi:10.18653/v1/p17-1099Takase, S., Suzuki, J., Okazaki, N., Hirao, T., & Nagata, M. (2016). Neural Headline Generation on Abstract Meaning Representation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. doi:10.18653/v1/d16-1112G. Tur and R. De Mori . Spoken language understanding: Systems for extracting semantic information from speech, John Wiley & Sons, 2011

    Self-Attention Networks Can Process Bounded Hierarchical Languages

    Despite their impressive performance in NLP, self-attention networks were recently proved to be limited for processing formal languages with hierarchical structure, such as Dyckk\mathsf{Dyck}_k, the language consisting of well-nested parentheses of kk types. This suggested that natural language can be approximated well with models that are too weak for formal languages, or that the role of hierarchy and recursion in natural language might be limited. We qualify this implication by proving that self-attention networks can process Dyckk,D\mathsf{Dyck}_{k, D}, the subset of Dyckk\mathsf{Dyck}_{k} with depth bounded by DD, which arguably better captures the bounded hierarchical structure of natural language. Specifically, we construct a hard-attention network with D+1D+1 layers and O(logk)O(\log k) memory size (per token per layer) that recognizes Dyckk,D\mathsf{Dyck}_{k, D}, and a soft-attention network with two layers and O(logk)O(\log k) memory size that generates Dyckk,D\mathsf{Dyck}_{k, D}. Experiments show that self-attention networks trained on Dyckk,D\mathsf{Dyck}_{k, D} generalize to longer inputs with near-perfect accuracy, and also verify the theoretical memory advantage of self-attention networks over recurrent networks.Comment: ACL 2021. 19 pages with extended appendix. Fixed a small typo in the formula at the end of page 5 (thank to Gabriel Faria). Code: https://github.com/princeton-nlp/dyck-transforme