4 research outputs found
"Seed+Expand": A validated methodology for creating high quality publication oeuvres of individual researchers
The study of science at the individual micro-level frequently requires the
disambiguation of author names. The creation of author's publication oeuvres
involves matching the list of unique author names to names used in publication
databases. Despite recent progress in the development of unique author
identifiers, e.g., ORCID, VIVO, or DAI, author disambiguation remains a key
problem when it comes to large-scale bibliometric analysis using data from
multiple databases. This study introduces and validates a new methodology
called seed+expand for semi-automatic bibliographic data collection for a given
set of individual authors. Specifically, we identify the oeuvre of a set of
Dutch full professors during the period 1980-2011. In particular, we combine
author records from the National Research Information System (NARCIS) with
publication records from the Web of Science. Starting with an initial list of
8,378 names, we identify "seed publications" for each author using five
different approaches. Subsequently, we "expand" the set of publication in three
different approaches. The different approaches are compared and resulting
oeuvres are evaluated on precision and recall using a "gold standard" dataset
of authors for which verified publications in the period 2001-2010 are
available.Comment: Paper accepted for the ISSI 2013, small changes in the text due to
referee comments, one figure added (Fig 3