464 research outputs found
Effective Unsupervised Author Disambiguation with Relative Frequencies
This work addresses the problem of author name homonymy in the Web of
Science. Aiming for an efficient, simple and straightforward solution, we
introduce a novel probabilistic similarity measure for author name
disambiguation based on feature overlap. Using the researcher-ID available for
a subset of the Web of Science, we evaluate the application of this measure in
the context of agglomeratively clustering author mentions. We focus on a
concise evaluation that shows clearly for which problem setups and at which
time during the clustering process our approach works best. In contrast to most
other works in this field, we are sceptical towards the performance of author
name disambiguation methods in general and compare our approach to the trivial
single-cluster baseline. Our results are presented separately for each correct
clustering size as we can explain that, when treating all cases together, the
trivial baseline and more sophisticated approaches are hardly distinguishable
in terms of evaluation results. Our model shows state-of-the-art performance
for all correct clustering sizes without any discriminative training and with
tuning only one convergence parameter.Comment: Proceedings of JCDL 201
Author disambiguation using multi-aspect similarity indicators
Key to accurate bibliometric analyses is the ability to correctly link individuals to their corpus of work, with an optimal balance between precision and recall. We have developed an algorithm that does this disambiguation task with a very high recall and precision. The method addresses the issues of discarded records due to null data fields and their resultant effect on recall, precision and F-measure results. We have implemented a dynamic approach to similarity calculations based on all available data fields. We have also included differences in author contribution and age difference between publications, both of which have meaningful effects on overall similarity measurements, resulting in significantly higher recall and precision of returned records. The results are presented from a test dataset of heterogeneous catalysis publications. Results demonstrate significantly high average F-measure scores and substantial improvements on previous and stand-alone techniques
Combining machine learning and human judgment in author disambiguation
ABSTRACT Author disambiguation in digital libraries becomes increasingly difficult as the number of publications and consequently the number of ambiguous author names keep growing. The fully automatic author disambiguation approach could not give satisfactory results due to the lack of signals in many cases. Furthermore, human judgment on the basis of automatic algorithms is also not suitable because the automatically disambiguated results are often mixed and not understandable for humans. In this paper, we propose a Labeling Oriented Author Disambiguation approach, called LOAD, to combine machine learning and human judgment together in author disambiguation. LOAD exploits a framework which consists of high precision clustering, high recall clustering, and top dissimilar clusters selection and ranking. In the framework, supervised learning algorithms are used to train the similarity functions between publications and a clustering algorithm is further applied to generate clusters. To validate the effectiveness and efficiency of the proposed LOAD approach, comprehensive experiments are conducted. Comparing to conventional author disambiguation algorithms, the LOAD yields much more accurate results to assist human labeling. Further experiments show that the LOAD approach can save labeling time dramatically
Researchers’ publication patterns and their use for author disambiguation
Over the recent years, we are witnessing an increase of the need for advanced bibliometric indicators on individual researchers and research groups, for which author disambiguation is needed. Using the complete population of university professors and researchers in the Canadian province of Québec (N=13,479), of their papers as well as the papers authored by their homonyms, this paper provides evidence of regularities in researchers’ publication patterns. It shows how these patterns can be used to automatically assign papers to individual and remove papers authored by their homonyms. Two types of patterns were found: 1) at the individual researchers’ level and 2) at the level of disciplines. On the whole, these patterns allow the construction of an algorithm that provides assignation information on at least one paper for 11,105 (82.4%) out of all 13,479 researchers—with a very low percentage of false positives (3.2%)
The Impact of Name-Matching and Blocking on Author Disambiguation
In this work, we address the problem of blocking in the context of author name disambiguation. We describe a framework that formalizes different ways of name-matching to determine which names could potentially refer to the same author. We focus on name variations that follow from specifying a name with different completeness (i.e. full first name or only initial). We extend this framework by a simple way to define traditional, new and custom blocking schemes. Then, we evaluate different old and new schemes in the Web of Science. In this context we define and compare a new type of blocking schemes. Based on these results, we discuss the question whether name-matching can be used in blocking evaluation as a replacement of annotated author identifiers. Finally, we argue that blocking can have a strong impact on the application and evaluation of author disambiguation
- …