Using citation redistribution to estimate unbiased expected citation count from a biased citation network

Abstract

Most readers can only read a fraction of the papers written on a topic. The heuristic of reading “highly cited articles first” is common, but certain types of articles are more likely to be cited without being more valid science. Moreover, various types of bias, including selection bias, sponsor bias, and contributor- and affiliation-related biases, exist in publications, and it is difficult for literature users to determine whether a particular document is biased. Therefore, we aim to create a new ranking heuristic that is based on risk of bias. As a first step, our prior work proposed a network metric, the ratio between the real and expected citation count, in order to select “marginalized papers” (Fu, Yuan, and Schneider, 2021). “Marginalized papers” are those that received far fewer citations than expected, perhaps in part because they contradict the dominant view or are less well-known. In principle, an unbiased paper should disclose the existence of multiple points of view through citation. Therefore, our work uses citation of marginalized papers to estimate the risk of bias. Calculating the expected citation count is tricky but important. We improved our prior approach by (1) grouping papers by publication year into “generations,” (2) evenly redistributing the total citations made in one generation between all papers in previous generations; (3) using the cumulative sum to obtain the expected citations for each paper. This new method generated more realistic expected citation counts than our prior approach. Our future work will focus on creating a method that ranks papers by how well they cite the “marginalized papers.”University of Illinois at Urbana-Champaign Campus Research Board RB21012NSF CAREER award 2046454Ope

    Similar works