2 research outputs found

    Leveraging full-text article exploration for citation analysis

    Get PDF
    Scientific articles often include in-text citations quoting from external sources. When the cited source is an article, the citation context can be analyzed by exploring the article full-text. To quickly access the key information, researchers are often interested in identifying the sections of the cited article that are most pertinent to the text surrounding the citation in the citing article. This paper first performs a data-driven analysis of the correlation between the textual content of the sections of the cited article and the text snippet where the citation is placed. The results of the correlation analysis show that the title and abstract of the cited article are likely to include content highly similar to the citing snippet. However, the subsequent sections of the paper often include cited text snippets as well. Hence, there is a need to understand the extent to which an exploration of the full-text of the cited article would be beneficial to gain insights into the citing snippet, considering also the fact that the full-text access could be restricted. To this end, we then propose a classification approach to automatically predicting whether the cited snippets in the full-text of the paper contain a significant amount of new content beyond abstract and title. The proposed approach could support researchers in leveraging full-text article exploration for citation analysis. The experiments conducted on real scientific articles show promising results: the classifier has a 90% chance to correctly distinguish between the full-text exploration and only title and abstract cases

    Poli2Sum@ CL-SciSumm-19: Identify, Classify, and Summarize Cited Text Spans by means of Ensembles of Supervised Models

    No full text
    This paper presents the Poli2Sum approach to the 5th Computational Linguistics Scientific Document Summarization Shared Task (BIRNDL CL-SciSumm 2019). Given a set of reference papers and the set of papers citing them, the proposed approach has a threefold aim.(1a) Identify the text spans in the reference paper that are referenced by a specific citation in the citing papers. (1b) Assign a facet to each citation describing the semantics behind the citation. (2) Generate a summary of the reference paper consisting of the most relevant cited text spans. The Poli2Sum approach to tasks (1a) and (1b) relies on an ensemble of classification and regression models trained on the annotated pairs of cited and citing sentences. Facet assignment is based on the relative positions of the cited sentences locally to the corresponding section and globally in the entire paper. Task (2) is addressed by predicting the overlap (in terms of units of text) between the selected text spans and the summary generated by the domain experts. The output summary consists of the subset of sentences maximizing the predicted overlap score
    corecore