386 research outputs found

    COMPENDIUM: a text summarisation tool for generating summaries of multiple purposes, domains, and genres

    Get PDF
    In this paper, we present a Text Summarisation tool, compendium, capable of generating the most common types of summaries. Regarding the input, single- and multi-document summaries can be produced; as the output, the summaries can be extractive or abstractive-oriented; and finally, concerning their purpose, the summaries can be generic, query-focused, or sentiment-based. The proposed architecture for compendium is divided in various stages, making a distinction between core and additional stages. The former constitute the backbone of the tool and are common for the generation of any type of summary, whereas the latter are used for enhancing the capabilities of the tool. The main contributions of compendium with respect to the state-of-the-art summarisation systems are that (i) it specifically deals with the problem of redundancy, by means of textual entailment; (ii) it combines statistical and cognitive-based techniques for determining relevant content; and (iii) it proposes an abstractive-oriented approach for facing the challenge of abstractive summarisation. The evaluation performed in different domains and textual genres, comprising traditional texts, as well as texts extracted from the Web 2.0, shows that compendium is very competitive and appropriate to be used as a tool for generating summaries.This research has been supported by the project “Desarrollo de TĂ©cnicas Inteligentes e Interactivas de MinerĂ­a de Textos” (PROMETEO/2009/119) and the project reference ACOMP/2011/001 from the Valencian Government, as well as by the Spanish Government (grant no. TIN2009-13391-C04-01)

    SNaC: Coherence Error Detection for Narrative Summarization

    Full text link
    Progress in summarizing long texts is inhibited by the lack of appropriate evaluation frameworks. When a long summary must be produced to appropriately cover the facets of that text, that summary needs to present a coherent narrative to be understandable by a reader, but current automatic and human evaluation methods fail to identify gaps in coherence. In this work, we introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries. We develop a taxonomy of coherence errors in generated narrative summaries and collect span-level annotations for 6.6k sentences across 150 book and movie screenplay summaries. Our work provides the first characterization of coherence errors generated by state-of-the-art summarization models and a protocol for eliciting coherence judgments from crowd annotators. Furthermore, we show that the collected annotations allow us to train a strong classifier for automatically localizing coherence errors in generated summaries as well as benchmarking past work in coherence modeling. Finally, our SNaC framework can support future work in long document summarization and coherence evaluation, including improved summarization modeling and post-hoc summary correction.Comment: EMNLP 202

    Inferential profiles emerging from reading for summarization and reading for translation tasks: an exploratory study

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro de Comunicação e ExpressĂŁo, Programa de PĂłs-Graduação em InglĂȘs: Estudos LinguĂ­sticos e LiterĂĄrios, FlorianĂłpolis, 2014.Abstract: The present research is an exploratory study, which intended to investigate the influence of reading purpose and readers' experience on task products and inferential processes in two study conditions, namely reading for summarization and reading for translation. The theoretical background guiding this study stemmed from models of discourse processing that have been established in the field (Graesser et al., 1994; Kintsch & van Dijk, 1978; van den Broek, Risden & Husebye-Hartman, 1995; van Dijk & Kintsch, 1983). Six participants took part in this research, two of them were professional translators and four of them were undergraduate students from the seventh semester of the Letras Course at Universidade Federal de Santa Catarina (UFSC). Two narrative texts in English, (L2), were read, then summarized and translated into Portuguese, (L1). In addition, keylogging data stemming from the study tasks (i.e., reading for summarization and reading for translation) were collected using Translog 2006, and retrospective verbal protocols were carried out after each study task. Analytical procedures involved triangulation of quantitative data from scores of the task products and total task times recorded in Translog 2006, with qualitative data from retrospective protocols. Verbalizations were categorized using a framework adapted from Graesser & Kreuz (1993) in order to help identify inference types generated for narrative texts under the aforementioned study conditions. The previous experience variable indicated positive tendencies for the translators' group and some trend towards beneficial effects for both undergraduate students. Qualitative data analysis resulting in the identification of inferential profiles was carried out to help explain efficient and strategic use of inferences in narrative comprehension. Implications of this study results led to pedagogical practices that foster the explicit teaching of inferences with a view to raising students' awareness about inferences possibilities and functions for reading, summaries and translations.Esta pesquisa Ă© um estudo exploratĂłrio, o qual pretendeu investigar a influĂȘncia do propĂłsito de leitura e a experiĂȘncia dos leitores sobre os produtos das tarefas do estudo e processos inferenciais em duas condiçÔes de estudo, ou seja, leitura para resumo e leitura para tradução. O referencial teĂłrico que norteia este estudo originou-se nos modelos de processamento do discurso que estĂŁo consagrados neste campo de conhecimento (Graesser et al., 1994; Kintsch & van Dijk, 1978; van den Broek, Risden & Husebye-Hartman, 1995; van Dijk & Kintsch, 1983). Seis participantes fizeram parte desta pesquisa, duas delas eram tradutoras profissionais e quatro delas eram estudantes de graduação do sĂ©timo semestre do curso de Letras da Universidade Federal de Santa Catarina (UFSC). Dois textos narrativos em inglĂȘs, (L2), foram lidos, resumidos e traduzidos para o portuguĂȘs, (L1). AlĂ©m disso, foram coletados dados de movimento de teclado gerados durante realização das tarefas do estudo (ou seja, leitura para resumo e leitura para tradução) por meio do programa Translog©2006. ApĂłs cada tarefa do estudo, foram realizados protocolos verbais retrospectivos. Os procedimentos analĂ­ticos envolveram a triangulação de dados quantitativos a partir de pontuaçÔes dos produtos do estudo e tempos totais de tarefas registrados no Translog©2006 com dados qualitativos dos protocolos verbais. As verbalizaçÔes foram categorizadas por meio de uma taxonomia adaptada a partir da tipologia proposta Graesser e Kreuz (1993) para que se identificassem os tipos de inferĂȘncias geradas para textos narrativos sob as condiçÔes supracitadas do estudo. Os resultados relacionados Ă  experiĂȘncia prĂ©via indicaram tendĂȘncias positivas entre participantes e tarefas. A anĂĄlise qualitativa dos dados resultou na identificação de perfis inferenciais, a qual foi realizada para ajudar a explicar o uso estratĂ©gico e eficiente de inferĂȘncias na compreensĂŁo de narrativas. Os resultados sugerem que perfis inferenciais dinĂąmicos e abrangentes foram associados a processos e produtos de tarefas de resumo e tradução mais satisfatĂłrios. As implicaçÔes dos resultados do estudo levaram a prĂĄticas pedagĂłgicas que incentivem o ensino explĂ­cito inferĂȘncias com o propĂłsito de conscientizar os estudantes sobre as possibilidades e funçÔes das inferĂȘncias para leitura, resumo e tradução

    Syntactic Sentence Compression for Text Summarization

    Get PDF
    Abstract Automatic text summarization is a dynamic area in Natural Language Processing that has gained much attention in the past few decades. As a vast amount of data is accumulating and becoming available online, providing automatic summaries of specific subjects/topics has become an important user requirement. To encourage the growth of this research area, several shared tasks are held annually and different types of benchmarks are made available. Early work on automatic text summarization focused on improving the relevance of the summary content but now the trend is more towards generating more abstractive and coherent summaries. As a result of this, sentence simplification has become a prominent requirement in automatic summarization. This thesis presents our work on sentence compression using syntactic pruning methods in order to improve automatic text summarization. Sentence compression has several applications in Natural Language Processing such as text simplification, topic and subtitle generation, removal of redundant information and text summarization. Effective sentence compression techniques can contribute to text summarization by simplifying texts, avoiding redundant and irrelevant information and allowing more space for useful information. In our work, we have focused on pruning individual sentences, using their phrase structure grammar representations. We have implemented several types of pruning techniques and the results were evaluated in the context of automatic summarization, using standard evaluation metrics. In addition, we have performed a series of human evaluations and a comparison with other sentence compression techniques used in automatic summarization. Our results show that our syntactic pruning techniques achieve compression rates that are similar to previous work and also with what humans achieve. However, the automatic evaluation using ROUGE shows that any type of sentence compression causes a decrease in content compared to the original summary and extra content addition does not show a significant improvement in ROUGE. The human evaluation shows that our syntactic pruning techniques remove syntactic structures that are similar to what humans remove and inter-annotator content evaluation using ROUGE shows that our techniques perform well compared to other baseline techniques. However, when we evaluate our techniques with a grammar structure based F-measure, the results show that our pruning techniques perform better and seem to approximate human techniques better than baseline techniques

    Academic Writing and the Pedagogical Practices of Effective Teachers

    Get PDF
    Composition, particularly when academic register is required, is a complex task. Because cognitive flexibility theory explains how humans can spontaneously restructure knowledge and adapt to situational demands, it is ideally suited to the ill-structured domain of transactional writing. Global aspects related to paragraph and whole-text structure and local operations related to word and sentence-level features define academic writing. A mixed-methods design used quantitative methods for investigation of five corpora of 10th grade students\u27 work. Qualitative methods were used to explore the means teachers used in promoting academic writing and the interactions they intended to promote via teaching cues, including prompts. Students\u27 perceptions were similarly explored for contrastive purposes. Descriptive statistical and qualitative analysis of five corpora of student writing samples, high school exit exam results, surveys of students and teachers, and interviews with students and teachers were employed. This study suggests that interaction with students, while they compose, is critical to successful academic writing on the part of students. Systems are slow to change; however, this study may provide some models and descriptions of successful performance needed to encourage teachers and school systems to improve practice and academic outcomes in writing and content areas that include writing as a means of learning and assessment. Increased instructional precision may be of more value than simple prescription. Results suggest that cross-disciplinary activities may improve the uptake of academic words found on an academic word list. In addition, the type and quality of the prompts or directions for writing students are given affect the quality of students\u27 written work. As well, students and teachers valued the cues and oral feedback provided on drafts of student compositions. The results of this study suggest that when students are provided a contextually rich environment, challenging writing tasks, and support with appropriate cues, they may succeed as writers and thinkers about complex topics within and across disciplines

    Aggregated search: a new information retrieval paradigm

    Get PDF
    International audienceTraditional search engines return ranked lists of search results. It is up to the user to scroll this list, scan within different documents and assemble information that fulfill his/her information need. Aggregated search represents a new class of approaches where the information is not only retrieved but also assembled. This is the current evolution in Web search, where diverse content (images, videos, ...) and relational content (similar entities, features) are included in search results. In this survey, we propose a simple analysis framework for aggregated search and an overview of existing work. We start with related work in related domains such as federated search, natural language generation and question answering. Then we focus on more recent trends namely cross vertical aggregated search and relational aggregated search which are already present in current Web search
    • 

    corecore