5 research outputs found

    Анализ библиографических ссылок как метод отбора отраслевой научной периодики

    Get PDF
    В обзоре рассматриваются работы советских и зарубежных авторов. Выявлены характерные особенности метода, доказывается целесообразность его применения в библиотечной деятельности. Дана оценка различных разновидностей анализа библиографических ссылок, предложена новая, экспериментально проверенная модификация метода

    Should citations be counted separately from each originating section?

    Get PDF
    Articles are cited for different purposes and differentiating between reasons when counting citations may therefore give finer-grained citation count information. Although identifying and aggregating the individual reasons for each citation may be impractical, recording the number of citations that originate from different article sections might illuminate the general reasons behind a citation count (e.g., 110 citations = 10 Introduction citations + 100 Methods citations). To help investigate whether this could be a practical and universal solution, this article compares 19 million citations with DOIs from six different standard sections in 799,055 PubMed Central open access articles across 21 out of 22 fields. There are apparently non-systematic differences between fields in the most citing sections and the extent to which citations from one section overlap with citations from another, with some degree of overlap in most cases. Thus, at a science-wide level, section headings are partly unreliable indicators of citation context, even if they are more standard within individual fields. They may still be used within fields to help identify individual highly cited articles that have had one type of impact, especially methodological (Methods) or context setting (Introduction), but expert judgement is needed to validate the results

    Complex Network Analysis for Scientific Collaboration Prediction and Biological Hypothesis Generation

    Get PDF
    With the rapid development of digitalized literature, more and more knowledge has been discovered by computational approaches. This thesis addresses the problem of link prediction in co-authorship networks and protein--protein interaction networks derived from the literature. These networks (and most other types of networks) are growing over time and we assume that a machine can learn from past link creations by examining the network status at the time of their creation. Our goal is to create a computationally efficient approach to recommend new links for a node in a network (e.g., new collaborations in co-authorship networks and new interactions in protein--protein interaction networks). We consider edges in a network that satisfies certain criteria as training instances for the machine learning algorithms. We analyze the neighborhood structure of each node and derive the topological features. Furthermore, each node has rich semantic information when linked to the literature and can be used to derive semantic features. Using both types of features, we train machine learning models to predict the probability of connection for the new node pairs. We apply our idea of link prediction to two distinct networks: a co-authorship network and a protein--protein interaction network. We demonstrate that the novel features we derive from both the network topology and literature content help improve link prediction accuracy. We also analyze the factors involved in establishing a new link and recurrent connections

    Exploiting Semantic Similarity Between Citation Contexts For Direct Citation Weighting And Residual Citation

    Get PDF
    This study used the semantic similarity between citation contexts to develop one scheme for weighting direct citations, and another scheme for allocating residual citations to a publication from its nth citation generation level publication. A relationship between the new direct citation weighting scheme and each of five existing schemes was investigated while the new residual citation scheme was compared with the cascading citation scheme. Two datasets from biomedical publications were used for this study, one each for the direct and residual citation weighting aspects of the study. The sample for the direct citation aspect contained 100 publications that received 7317 citations, 11,234 citation contexts, and 9,795 citation context pairs. A sample of 981 citation context pairs was given to two human experts for annotation into “similar”, “somewhat similar”, and “not similar” classes. Semantic similarity scores between the 11,234 citation contexts were obtained using BioSent2Vec word-embedding model for biomedical publications. The residual citation aspect sample included ten base articles and five generations of citations from which 5272 citation context pairs were obtained. Results of the Spearman’s rank correlation test showed that the correlation coefficients between the proposed direct citation weighting scheme and each of the weighting schemes “number of positive sentiments,” “number of multiple citation mentions,” “sum of multiple citation mentions,” “number of citations,” and “number of citation mentions” were .83, .89, .89, .93, and .99 respectively. The average residual citations received from the 2nd, 3rd, 4th and 5th citation generation level papers were 0.47, 0.43, 0.40, and 0.37 respectively. These average residual citations were significantly different from the averages of 0.5, 0.25, 0.125, and 0.0625 suggested by the cascading citation scheme. Even though the proposed direct citation weighting scheme and the residual citation scheme require more complex computations, it is recommended that they should be considered as credible alternatives to the “number of citation mentions” and cascading citation scheme respectively
    corecore