997,891 research outputs found

    Siamese hierarchical attention networks for extractive summarization

    Full text link
    [EN] In this paper, we present an extractive approach to document summarization based on Siamese Neural Networks. Specifically, we propose the use of Hierarchical Attention Networks to select the most relevant sentences of a text to make its summary. We train Siamese Neural Networks using document-summary pairs to determine whether the summary is appropriated for the document or not. By means of a sentence-level attention mechanism the most relevant sentences in the document can be identified. Hence, once the network is trained, it can be used to generate extractive summaries. The experimentation carried out using the CNN/DailyMail summarization corpus shows the adequacy of the proposal. In summary, we propose a novel end-to-end neural network to address extractive summarization as a binary classification problem which obtains promising results in-line with the state-of-the-art on the CNN/DailyMail corpus.This work has been partially supported by the Spanish MINECO and FEDER founds under project AMIC (TIN2017-85854-C4-2-R). Work of Jose-Angel Gonzalez is also financed by Universitat Politecnica de Valencia under grant PAID-01-17.González-Barba, JÁ.; Segarra Soriano, E.; García-Granada, F.; Sanchís Arnal, E.; Hurtado Oliver, LF. (2019). Siamese hierarchical attention networks for extractive summarization. Journal of Intelligent & Fuzzy Systems. 36(5):4599-4607. https://doi.org/10.3233/JIFS-179011S45994607365N. Begum , M. Fattah , and F. Ren . Automatic text summarization using support vector machine 5(7) (2009), 1987–1996.J. Cheng and M. Lapata . Neural summarization by extracting sentences and words. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, 2016.K.M. Hermann , T. Kocisky , E. Grefenstette , L. Espeholt , W. Kay , M. Suleyman , and P. Blunsom . Teaching machines to read and comprehend, CoRR, abs/1506.03340, 2015.D.P. Kingma and J. Ba . Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.Lloret, E., & Palomar, M. (2011). Text summarisation in progress: a literature review. Artificial Intelligence Review, 37(1), 1-41. doi:10.1007/s10462-011-9216-zLouis, A., & Nenkova, A. (2013). Automatically Assessing Machine Summary Content Without a Gold Standard. Computational Linguistics, 39(2), 267-300. doi:10.1162/coli_a_00123Miao, Y., & Blunsom, P. (2016). Language as a Latent Variable: Discrete Generative Models for Sentence Compression. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. doi:10.18653/v1/d16-1031R. Mihalcea and P. Tarau . Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 2004.T. Mikolov , K. Chen , G. S. Corrado , and J. Dean . Efficient estimation of word representations in vector space, CoRR, abs/1301.3781, 2013.Minaee, S., & Liu, Z. (2017). Automatic question-answering using a deep similarity neural network. 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP). doi:10.1109/globalsip.2017.8309095R. Paulus , C. Xiong , and R. Socher , A deep reinforced model for abstractive summarization. CoRR, abs/1705.04304, 2017.Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673-2681. doi:10.1109/78.650093See, A., Liu, P. J., & Manning, C. D. (2017). Get To The Point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). doi:10.18653/v1/p17-1099Takase, S., Suzuki, J., Okazaki, N., Hirao, T., & Nagata, M. (2016). Neural Headline Generation on Abstract Meaning Representation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. doi:10.18653/v1/d16-1112G. Tur and R. De Mori . Spoken language understanding: Systems for extracting semantic information from speech, John Wiley & Sons, 2011

    Multilevel and session variability compensated language recognition: ATVS-UAM systems at NIST LRE 2009

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. Gonzalez-Dominguez, I. Lopez-Moreno, J. Franco-Pedroso, D. Ramos, D. T. Toledano, and J. Gonzalez-Rodriguez, "Multilevel and Session Variability Compensated Language Recognition: ATVS-UAM Systems at NIST LRE 2009" IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 6, pp. 1084 – 1093, December 2010This work presents the systems submitted by the ATVS Biometric Recognition Group to the 2009 Language Recognition Evaluation (LRE’09), organized by NIST. New challenges included in this LRE edition can be summarized by three main differences with respect to past evaluations. Firstly, the number of languages to be recognized expanded to 23 languages from 14 in 2007, and 7 in 2005. Secondly, the data variability has been increased by including telephone speech excerpts extracted from Voice of America (VOA) radio broadcasts through Internet in addition to Conversational Telephone Speech (CTS). The third difference was the volume of data, involving in this evaluation up to 2 terabytes of speech data for development, which is an order of magnitude greater than past evaluations. LRE’09 thus required participants to develop robust systems able not only to successfully face the session variability problem but also to do it with reasonable computational resources. ATVS participation consisted of state-of-the-art acoustic and high-level systems focussing on these issues. Furthermore, the problem of finding a proper combination and calibration of the information obtained at different levels of the speech signal was widely explored in this submission. In this work, two original contributions were developed. The first contribution was applying a session variability compensation scheme based on Factor Analysis (FA) within the statistics domain into a SVM-supervector (SVM-SV) approach. The second contribution was the employment of a novel backend based on anchor models in order to fuse individual systems prior to one-vs-all calibration via logistic regression. Results both in development and evaluation corpora show the robustness and excellent performance of the submitted systems, exemplified by our system ranked 2nd in the 30 second open-set condition, with remarkably scarce computational resources.This work has been supported by the Spanish Ministry of Education under project TEC2006-13170-C02-01. Javier Gonzalez-Dominguez also thanks Spanish Ministry of Education for supporting his doctoral research under project TEC2006-13141-C03-03. Special thanks are given to Dr. David Van Leeuwen from TNO Human Factors (Utrech, The Netherlands) for his strong collaboration, valuable discussions and ideas. Also, authors thank to Dr. Patrick Lucey for his final support on (non-target) Australian English review of the manuscript

    Hypermedia-based discovery for source selection using low-cost linked data interfaces

    Get PDF
    Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed-even though it has a strong impact on selecting sources that contribute to the query results. Therefore, the authors introduce a discovery approach for Linked Data interfaces based on hypermedia links and controls, and apply it to federated query execution with Triple Pattern Fragments. In addition, the authors identify quantitative metrics to evaluate this discovery approach. This article describes generic evaluation measures and results for their concrete approach. With low-cost data summaries as seed, interfaces to eight large real-world datasets can discover each other within 7 minutes. Hypermedia-based client-side querying shows a promising gain of up to 50% in execution time, but demands algorithms that visit a higher number of interfaces to improve result completeness

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Recent advances on recursive filtering and sliding mode design for networked nonlinear stochastic systems: A survey

    Get PDF
    Copyright © 2013 Jun Hu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Some recent advances on the recursive filtering and sliding mode design problems for nonlinear stochastic systems with network-induced phenomena are surveyed. The network-induced phenomena under consideration mainly include missing measurements, fading measurements, signal quantization, probabilistic sensor delays, sensor saturations, randomly occurring nonlinearities, and randomly occurring uncertainties. With respect to these network-induced phenomena, the developments on filtering and sliding mode design problems are systematically reviewed. In particular, concerning the network-induced phenomena, some recent results on the recursive filtering for time-varying nonlinear stochastic systems and sliding mode design for time-invariant nonlinear stochastic systems are given, respectively. Finally, conclusions are proposed and some potential future research works are pointed out.This work was supported in part by the National Natural Science Foundation of China under Grant nos. 61134009, 61329301, 61333012, 61374127 and 11301118, the Engineering and Physical Sciences Research Council (EPSRC) of the UK under Grant no. GR/S27658/01, the Royal Society of the UK, and the Alexander von Humboldt Foundation of Germany

    Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

    Full text link
    In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.Comment: 22 pages, 12 figures, 9 table
    • 

    corecore