6 research outputs found
The relation between Pearson's correlation coefficient r and Salton's cosine measure
The relation between Pearson's correlation coefficient and Salton's cosine
measure is revealed based on the different possible values of the division of
the L1-norm and the L2-norm of a vector. These different values yield a sheaf
of increasingly straight lines which form together a cloud of points, being the
investigated relation. The theoretical results are tested against the author
co-citation relations among 24 informetricians for whom two matrices can be
constructed, based on co-citations: the asymmetric occurrence matrix and the
symmetric co-citation matrix. Both examples completely confirm the theoretical
results. The results enable us to specify an algorithm which provides a
threshold value for the cosine above which none of the corresponding Pearson
correlations would be negative. Using this threshold value can be expected to
optimize the visualization of the vector space
Quantitative aspects of the management of the modern (scientific) library
This paper and talk examines aspects of data collection for the management of a modern (scientific) library. We discuss: reports as a public relations and public awareness tool, norms and standards, data gathering and its problems in an electronic environment, indicators, complete and incomplete data (sampling) and their uses
How to Normalize Co-Occurrence Data? An Analysis of Some Well-Known Similarity Measures
In scientometric research, the use of co-occurrence data is very common. In many cases, a similarity measure is employed to normalize the data. However, there is no consensus among researchers on which similarity measure is most appropriate for normalization purposes. In this paper, we theoretically analyze the properties of similarity measures for co-occurrence data, focusing in particular on four well-known measures: the association strength, the cosine, the inclusion index, and the Jaccard index. We also study the behavior of these measures empirically. Our analysis reveals that there exist two fundamentally different types of similarity measures, namely set-theoretic measures and probabilistic measures. The association strength is a probabilistic measure, while the cosine, the inclusion index, and the Jaccard index are set-theoretic measures. Both our theoretical and our empirical results indicate that co-occurrence data can best be normalized using a probabilistic measure. This provides strong support for the use of the association strength in scientometric research