47,603 research outputs found

    Extractive summarization using siamese hierarchical transformer encoders

    Full text link
    [EN] In this paper, we present an extractive approach to document summarization, the Siamese Hierarchical Transformer Encoders system, that is based on the use of siamese neural networks and the transformer encoders which are extended in a hierarchical way. The system, trained for binary classification, is able to assign attention scores to each sentence in the document. These scores are used to select the most relevant sentences to build the summary. The main novelty of our proposal is the use of self-attention mechanisms at sentence level for document summarization, instead of using only attentions at word level. The experimentation carried out using the CNN/DailyMail summarization corpus shows promising results in-line with the state-of-the-art.This work has been partially supported by the Spanish MINECO and FEDER founds under project AMIC (TIN2017-85854-C4-2-R). Work of Jose Angel Gonzalez is also financed by Universitat Politecnica de Valencia under grant PAID-01-17.González-Barba, JÁ.; Segarra Soriano, E.; García-Granada, F.; Sanchís Arnal, E.; Hurtado Oliver, LF. (2020). Extractive summarization using siamese hierarchical transformer encoders. Journal of Intelligent & Fuzzy Systems. 39(2):2409-2419. https://doi.org/10.3233/JIFS-179901S24092419392Begum N. , Fattah M. and Ren F. , Automatic text summarization using support vector machine, 5 (2009), 1987–1996.González, J.-Á., Segarra, E., García-Granada, F., Sanchis, E., & Hurtado, L.-F. (2019). Siamese hierarchical attention networks for extractive summarization. Journal of Intelligent & Fuzzy Systems, 36(5), 4599-4607. doi:10.3233/jifs-179011Lloret, E., & Palomar, M. (2011). Text summarisation in progress: a literature review. Artificial Intelligence Review, 37(1), 1-41. doi:10.1007/s10462-011-9216-zLouis, A., & Nenkova, A. (2013). Automatically Assessing Machine Summary Content Without a Gold Standard. Computational Linguistics, 39(2), 267-300. doi:10.1162/coli_a_00123Tur G. and De Mori R. , Spoken language understanding: Systems for extracting semantic information from speech. John Wiley & Sons, 2011

    Using Textual Summaries to Describe a Set of Products

    Get PDF
    When customers are faced with the task of making a purchase in an unfamiliar product domain, it might be useful to provide them with an overview of the product set to help them understand what they can expect. In this paper we present and evaluate a method to summarise sets of products in natural language, focusing on the price range, common product features across the set, and product features that impact on price. In our study, participants reported that they found our summaries useful, but we found no evidence that the summaries influenced the selections made by participants

    Measuring academic influence: Not all citations are equal

    Get PDF
    The importance of a research article is routinely measured by counting how many times it has been cited. However, treating all citations with equal weight ignores the wide variety of functions that citations perform. We want to automatically identify the subset of references in a bibliography that have a central academic influence on the citing paper. For this purpose, we examine the effectiveness of a variety of features for determining the academic influence of a citation. By asking authors to identify the key references in their own work, we created a data set in which citations were labeled according to their academic influence. Using automatic feature selection with supervised machine learning, we found a model for predicting academic influence that achieves good performance on this data set using only four features. The best features, among those we evaluated, were those based on the number of times a reference is mentioned in the body of a citing paper. The performance of these features inspired us to design an influence-primed h-index (the hip-index). Unlike the conventional h-index, it weights citations by how many times a reference is mentioned. According to our experiments, the hip-index is a better indicator of researcher performance than the conventional h-index

    An Investigation on Text-Based Cross-Language Picture Retrieval Effectiveness through the Analysis of User Queries

    Get PDF
    Purpose: This paper describes a study of the queries generated from a user experiment for cross-language information retrieval (CLIR) from a historic image archive. Italian speaking users generated 618 queries for a set of known-item search tasks. The queries generated by user’s interaction with the system have been analysed and the results used to suggest recommendations for the future development of cross-language retrieval systems for digital image libraries. Methodology: A controlled lab-based user study was carried out using a prototype Italian-English image retrieval system. Participants were asked to carry out searches for 16 images provided to them, a known-item search task. User’s interactions with the system were recorded and queries were analysed manually quantitatively and qualitatively. Findings: Results highlight the diversity in requests for similar visual content and the weaknesses of Machine Translation for query translation. Through the manual translation of queries we show the benefits of using high-quality translation resources. The results show the individual characteristics of user’s whilst performing known-item searches and the overlap obtained between query terms and structured image captions, highlighting the use of user’s search terms for objects within the foreground of an image. Limitations and Implications: This research looks in-depth into one case of interaction and one image repository. Despite this limitation, the discussed results are likely to be valid across other languages and image repository. Value: The growing quantity of digital visual material in digital libraries offers the potential to apply techniques from CLIR to provide cross-language information access services. However, to develop effective systems requires studying user’s search behaviours, particularly in digital image libraries. The value of this paper is in the provision of empirical evidence to support recommendations for effective cross-language image retrieval system design.</p

    Learning Sentence-internal Temporal Relations

    Get PDF
    In this paper we propose a data intensive approach for inferring sentence-internal temporal relations. Temporal inference is relevant for practical NLP applications which either extract or synthesize temporal information (e.g., summarisation, question answering). Our method bypasses the need for manual coding by exploiting the presence of markers like after", which overtly signal a temporal relation. We first show that models trained on main and subordinate clauses connected with a temporal marker achieve good performance on a pseudo-disambiguation task simulating temporal inference (during testing the temporal marker is treated as unseen and the models must select the right marker from a set of possible candidates). Secondly, we assess whether the proposed approach holds promise for the semi-automatic creation of temporal annotations. Specifically, we use a model trained on noisy and approximate data (i.e., main and subordinate clauses) to predict intra-sentential relations present in TimeBank, a corpus annotated rich temporal information. Our experiments compare and contrast several probabilistic models differing in their feature space, linguistic assumptions and data requirements. We evaluate performance against gold standard corpora and also against human subjects
    corecore