6 research outputs found

    The relation between Pearson's correlation coefficient r and Salton's cosine measure

    Full text link
    The relation between Pearson's correlation coefficient and Salton's cosine measure is revealed based on the different possible values of the division of the L1-norm and the L2-norm of a vector. These different values yield a sheaf of increasingly straight lines which form together a cloud of points, being the investigated relation. The theoretical results are tested against the author co-citation relations among 24 informetricians for whom two matrices can be constructed, based on co-citations: the asymmetric occurrence matrix and the symmetric co-citation matrix. Both examples completely confirm the theoretical results. The results enable us to specify an algorithm which provides a threshold value for the cosine above which none of the corresponding Pearson correlations would be negative. Using this threshold value can be expected to optimize the visualization of the vector space

    Quantitative aspects of the management of the modern (scientific) library

    Get PDF
    This paper and talk examines aspects of data collection for the management of a modern (scientific) library. We discuss: reports as a public relations and public awareness tool, norms and standards, data gathering and its problems in an electronic environment, indicators, complete and incomplete data (sampling) and their uses

    Expansion of the field of informetrics: The second special issue

    Full text link

    How to Normalize Co-Occurrence Data? An Analysis of Some Well-Known Similarity Measures

    Get PDF
    In scientometric research, the use of co-occurrence data is very common. In many cases, a similarity measure is employed to normalize the data. However, there is no consensus among researchers on which similarity measure is most appropriate for normalization purposes. In this paper, we theoretically analyze the properties of similarity measures for co-occurrence data, focusing in particular on four well-known measures: the association strength, the cosine, the inclusion index, and the Jaccard index. We also study the behavior of these measures empirically. Our analysis reveals that there exist two fundamentally different types of similarity measures, namely set-theoretic measures and probabilistic measures. The association strength is a probabilistic measure, while the cosine, the inclusion index, and the Jaccard index are set-theoretic measures. Both our theoretical and our empirical results indicate that co-occurrence data can best be normalized using a probabilistic measure. This provides strong support for the use of the association strength in scientometric research
    corecore