8 research outputs found

    The Closer the Better: Similarity of Publication Pairs at Different Co-Citation Levels

    Full text link
    We investigate the similarities of pairs of articles which are co-cited at the different co-citation levels of the journal, article, section, paragraph, sentence and bracket. Our results indicate that textual similarity, intellectual overlap (shared references), author overlap (shared authors), proximity in publication time all rise monotonically as the co-citation level gets lower (from journal to bracket). While the main gain in similarity happens when moving from journal to article co-citation, all level changes entail an increase in similarity, especially section to paragraph and paragraph to sentence/bracket levels. We compare results from four journals over the years 2010-2015: Cell, the European Journal of Operational Research, Physics Letters B and Research Policy, with consistent general outcomes and some interesting differences. Our findings motivate the use of granular co-citation information as defined by meaningful units of text, with implications for, among others, the elaboration of maps of science and the retrieval of scholarly literature

    A Correlation Study of Co-opinion and Co-citation Similarity Measures

    Get PDF
    Co-citation forms a relational document network. Co-citation-based measures are found to be effective in retrieving relevant documents. However, they are far from ideal and need further enhancements. Co-opinion concept was proposed and tested in previous research and found to be effective in retrieving relevant documents. The present study endeavors to explore the correlation between opinion (dis)similarity measures and the traditional co-citation-based ones including Citation Proximity Index (CPI), co-citedness and co-citation context similarity. The results show significant, though weak to medium, correlations between the variables. The correlations are direct for co-opinion measure, while being inverse for the opinion distance. Accordingly, the two groups of measures are revealed to represent some similar aspects of the document relation. Moreover, the weakness of the correlations implies that there are different dimensions represented by the two group

    A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies

    Get PDF
    In-text citation analysis is one of the most frequently used methods in research evaluation. We are seeing significant growth in citation analysis through bibliometric metadata, primarily due to the availability of citation databases such as the Web of Science, Scopus, Google Scholar, Microsoft Academic, and Dimensions. Due to better access to full-text publication corpora in recent years, information scientists have gone far beyond traditional bibliometrics by tapping into advancements in full-text data processing techniques to measure the impact of scientific publications in contextual terms. This has led to technical developments in citation classifications, citation sentiment analysis, citation summarisation, and citation-based recommendation. This article aims to narratively review the studies on these developments. Its primary focus is on publications that have used natural language processing and machine learning techniques to analyse citations

    Exploiting Semantic Similarity Between Citation Contexts For Direct Citation Weighting And Residual Citation

    Get PDF
    This study used the semantic similarity between citation contexts to develop one scheme for weighting direct citations, and another scheme for allocating residual citations to a publication from its nth citation generation level publication. A relationship between the new direct citation weighting scheme and each of five existing schemes was investigated while the new residual citation scheme was compared with the cascading citation scheme. Two datasets from biomedical publications were used for this study, one each for the direct and residual citation weighting aspects of the study. The sample for the direct citation aspect contained 100 publications that received 7317 citations, 11,234 citation contexts, and 9,795 citation context pairs. A sample of 981 citation context pairs was given to two human experts for annotation into “similar”, “somewhat similar”, and “not similar” classes. Semantic similarity scores between the 11,234 citation contexts were obtained using BioSent2Vec word-embedding model for biomedical publications. The residual citation aspect sample included ten base articles and five generations of citations from which 5272 citation context pairs were obtained. Results of the Spearman’s rank correlation test showed that the correlation coefficients between the proposed direct citation weighting scheme and each of the weighting schemes “number of positive sentiments,” “number of multiple citation mentions,” “sum of multiple citation mentions,” “number of citations,” and “number of citation mentions” were .83, .89, .89, .93, and .99 respectively. The average residual citations received from the 2nd, 3rd, 4th and 5th citation generation level papers were 0.47, 0.43, 0.40, and 0.37 respectively. These average residual citations were significantly different from the averages of 0.5, 0.25, 0.125, and 0.0625 suggested by the cascading citation scheme. Even though the proposed direct citation weighting scheme and the residual citation scheme require more complex computations, it is recommended that they should be considered as credible alternatives to the “number of citation mentions” and cascading citation scheme respectively
    corecore