Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods


Authors are a substantial part of queries in digital libraries, where the results are reflecting the service quality and success. Ambiguous author names can confuse users and cause an inaccurate relation between authorships and individual researchers. Providing a set of disambiguated authors is challenging and related to data integration, since this is done in several ways and by different systems, both manually and automatically. Many disambiguation algorithms have been proposed in the literature, where the most solutions are solving the ambiguities by applying machine learning techniques. However, such problems cannot be solved with an accuracy of 100%. Our contributions to the CERN Document Server presented in this work consists of two parts: first, we create and deploy an author knowledge data base (collection) and second, we link authors of bibliographic records back to their authority records. For the latter, we use a library providing machine learning tools for clustering (where we use trained data from INSPIRE---a High-Energy Physics literature database developed at CERN) and construct an algorithm to build the relation, based on authority id and name matching. We could attribute 30% of 9 million authors to almost 9'500 individuals, which is also limited to our current author collection containing more than 41'000 records (and counting), based on people affiliated to the organization

Similar works

Full text


CERN Document Server

Full text is not available
oai:cds.cern.ch:2203031Last time updated on 8/9/2016

This paper was published in CERN Document Server.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.