580 research outputs found

    Query-Based Summarization using Rhetorical Structure Theory

    Get PDF
    Research on Question Answering is focused mainly on classifying the question type and finding the answer. Presenting the answer in a way that suits the user’s needs has received little attention. This paper shows how existing question answering systems—which aim at finding precise answers to questions—can be improved by exploiting summarization techniques to extract more than just the answer from the document in which the answer resides. This is done using a graph search algorithm which searches for relevant sentences in the discourse structure, which is represented as a graph. The Rhetorical Structure Theory (RST) is used to create a graph representation of a text document. The output is an extensive answer, which not only answers the question, but also gives the user an opportunity to assess the accuracy of the answer (is this what I am looking for?), and to find additional information that is related to the question, and which may satisfy an information need. This has been implemented in a working multimodal question answering system where it operates with two independently developed question answering modules

    Towards Exascale Scientific Metadata Management

    Full text link
    Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

    Argumentation Mining in User-Generated Web Discourse

    Full text link
    The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

    Parsing Argumentation Structures in Persuasive Essays

    Full text link
    In this article, we present a novel approach for parsing argumentation structures. We identify argument components using sequence labeling at the token level and apply a new joint model for detecting argumentation structures. The proposed model globally optimizes argument component types and argumentative relations using integer linear programming. We show that our model considerably improves the performance of base classifiers and significantly outperforms challenging heuristic baselines. Moreover, we introduce a novel corpus of persuasive essays annotated with argumentation structures. We show that our annotation scheme and annotation guidelines successfully guide human annotators to substantial agreement. This corpus and the annotation guidelines are freely available for ensuring reproducibility and to encourage future research in computational argumentation.Comment: Under review in Computational Linguistics. First submission: 26 October 2015. Revised submission: 15 July 201

    Summarizing text to embed qualitative data into visualizations

    Full text link
    Qualitative data can be conveyed with strings of text. Fitting longer text into visualizations requires a) space to place the text inside the visualization; and b) appropriate text to fit the space available. For quantitative visualizations, space is available in area marks; or within visualization layouts where the marks have an implied space (e.g. bar charts). For qualitative visualizations, space is defined in common text layouts such as prose paragraphs. To fit text within these layouts is a function for emerging NLP capabilities such as summarization.Comment: 6 pages, 8 figures, accepted at NLVIZ 2022: Exploring Research Opportunities for Natural Language, Text, and Data Visualizatio

    Summarization and Evaluation; Where are we today?!

    Get PDF
    PACLIC 21 / Seoul National University, Seoul, Korea / November 1-3, 200

    Special Libraries, November 1940

    Get PDF
    Volume 31, Issue 9https://scholarworks.sjsu.edu/sla_sl_1940/1008/thumbnail.jp

    Automatic text summarization with Maximal Frequent Sequences

    Get PDF
    En las últimas dos décadas un aumento exponencial de la información electrónica ha provocado una gran necesidad de entender rápidamente grandes volúmenes de información. En este libro se desarrollan los métodos automáticos para producir un resumen. Un resumen es un texto corto que transmite la información más importante de un documento o de una colección de documentos. Los resúmenes utilizados en este libro son extractivos: una selección de las oraciones más importantes del texto. Otros retos consisten en generar resúmenes de manera independiente de lenguaje y dominio. Se describe la identificación de cuatro etapas para generación de resúmenes extractivos. La primera etapa es la selección de términos, en la que uno tiene que decidir qué unidades contarían como términos individuales. El proceso de estimación de la utilidad de los términos individuales se llama etapa de pesado de términos. El siguiente paso se denota como pesado de oraciones, donde todas las secuencias reciben alguna medida numérica de acuerdo con la utilidad de términos. Finalmente, el proceso de selección de las oraciones más importantes se llama selección de oraciones. Los diferentes métodos para generación de resúmenes extractivos pueden ser caracterizados como representan estas etapas. En este libro se describe la etapa de selección de términos, en la que la detección de descripciones multipalabra se realiza considerando Secuencias Frecuentes Maximales (sfms), las cuales adquieren un significado importante, mientras Secuencias Frecuentes (sf) no maximales, que son partes de otros sf, no deben de ser consideradas. En la motivación se consideró costo vs. beneficio: existen muchas sf no maximales, mientras que la probabilidad de adquirir un significado importante es baja. De todos modos, las sfms representan todas las sfs en el modo compacto: todas las sfs podrían ser obtenidas a partir de todas las sfms explotando cada sfm al conjunto de todas sus subsecuencias. Se presentan los nuevos métodos basados en grafos, algoritmos de agrupamiento y algoritmos genéticos, los cuales facilitan la tarea de generación de resúmenes de textos. Se ha experimentado diferentes combinaciones de las opciones de selección de términos, pesado de términos, pesado de oraciones y selección de oraciones para generar los resúmenes extractivos de textos independientes de lenguaje y dominio para una colección de noticias. Se ha analizado algunas opciones basadas en descripciones multipalabra considerándolas en los métodos de grafos, algoritmos de agrupamiento y algoritmos genéticos. Se han obtenido los resultados superiores al de estado de arte. Este libro está dirigido a los estudiantes y científicos del área de Lingüística Computacional, y también a quienes quieren saber sobre los recientes avances en las investigaciones de generación automática de resúmenes de textos.In the last two decades, an exponential increase in the available electronic information causes a big necessity to quickly understand large volumes of information. It raises the importance of the development of automatic methods for detecting the most relevant content of a document in order to produce a shorter text. Automatic Text Summarization (ats) is an active research area dedicated to generate abstractive and extractive summaries not only for a single document, but also for a collection of documents. Other necessity consists in finding method for ats in a language and domain independent way. In this book we consider extractive text summarization for single document task. We have identified that a typical extractive summarization method consists in four steps. First step is a term selection where one should decide what units will count as individual terms. The process of estimating the usefulness of the individual terms is called term weighting step. The next step denotes as sentence weighting where all the sentences receive some numerical measure according to the usefulness of its terms. Finally, the process of selecting the most relevant sentences calls sentence selection. Different extractive summarization methods can be characterized how they perform these steps. In this book, in the term selection step, we describe how to detect multiword descriptions considering Maximal Frequent Sequences (mfss), which bearing important meaning, while non-maximal frequent sequences (fss), those that are parts of another fs, should not be considered. Our additional motivation was cost vs. benefit considerations: there are too many non-maximal fss while their probability to bear important meaning is lower. In any case, mfss represent all fss in a compact way: all fss can be obtained from all mfss by bursting each mfs into a set of all its subsequences.New methods based on graph algorithms, genetic algorithms, and clustering algorithms which facilitate the text summarization task are presented. We have tested different combinations of term selection, term weighting, sentence weighting and sentence selection options for language-and domain-independent extractive single-document text summarization on a news report collection. We analyzed several options based on mfss, considering them with graph, genetic, and clustering algorithms. We obtained results superior to the existing state-ofthe- art methods. This book is addressed for students and scientists of the area of Computational Linguistics, and also who wants to know recent developments in the area of Automatic Text Generation of Summaries
    corecore