50 research outputs found

    Leveraging full-text article exploration for citation analysis

    Get PDF
    Scientific articles often include in-text citations quoting from external sources. When the cited source is an article, the citation context can be analyzed by exploring the article full-text. To quickly access the key information, researchers are often interested in identifying the sections of the cited article that are most pertinent to the text surrounding the citation in the citing article. This paper first performs a data-driven analysis of the correlation between the textual content of the sections of the cited article and the text snippet where the citation is placed. The results of the correlation analysis show that the title and abstract of the cited article are likely to include content highly similar to the citing snippet. However, the subsequent sections of the paper often include cited text snippets as well. Hence, there is a need to understand the extent to which an exploration of the full-text of the cited article would be beneficial to gain insights into the citing snippet, considering also the fact that the full-text access could be restricted. To this end, we then propose a classification approach to automatically predicting whether the cited snippets in the full-text of the paper contain a significant amount of new content beyond abstract and title. The proposed approach could support researchers in leveraging full-text article exploration for citation analysis. The experiments conducted on real scientific articles show promising results: the classifier has a 90% chance to correctly distinguish between the full-text exploration and only title and abstract cases

    Testing a Citation and Text-Based Framework for Retrieving Publications for Literature Reviews

    Get PDF
    We propose a citation- and text-based framework to conduct literature review searches. Given a small set of articles included in a literature review (i.e. seed articles), the first step of the framework retrieves articles that are connected to the seed articles in the citation network. The next step filters these retrieved articles using a hybrid citation and text-based criteria. In this paper, we evaluate a first implementation of this framework (code available at https://github.com/janinaj/lit-review-search) by comparing it to the conventional search methods for retrieving the included studies of 6 published systematic reviews. Using different combinations of 3 seed articles, on average we retrieved 71.2% of the total included studies in the published reviews and 82.33% of the studies available in the search database (Scopus). Our best combinations retrieved 87% of the total included studies, which comprised 100% of the studies available in Scopus. In 5 of the 6 reviews, we reduced the number of results by 34–88%, which in practice would save reviewers significant time, since the overall number of search results that need to be manually screened is substantially reduced. These results suggest that our framework is a promising approach to improving the literature review search process.Ope

    Resolving Citation Links With Neural Networks

    Get PDF
    This work demonstrates how neural network models (NNs) can be exploited toward resolving citation links in the scientific literature, which involves locating passages in the source paper the author had intended when citing the paper. We look at two kinds of models: triplet and binary. The triplet network model works by ranking potential candidates, using what is generally known as the triplet loss, while the binary model tackles the issue by turning it into a binary decision problem, i.e., by labeling a candidate as true or false, depending on how likely a target it is. Experiments are conducted using three datasets developed by the CL-SciSumm project from a large repository of scientific papers in the Association for Computational Linguistics (ACL) repository. The results find that NNs are extremely susceptible to how the input is represented: they perform better on inputs expressed in binary format than on those encoded using the TFIDF metric or neural embeddings of specific kinds. Furthermore, in response to a difficulty NNs and baselines faced in predicting the exact location of a target, we introduce the idea of approximately correct targets (ACTs) where the goal is to find a region which likely contains a true target rather than its exact location. We show that with the ACTs, NNs consistently outperform Ranking SVM and TFIDF on the aforementioned datasets
    corecore