Towards hierarchical affiliation resolution: framework, baselines, dataset

Abstract

Author affiliations provide key information when attributing academic performance like publication counts. So far, such measures have been aggregated either manually or only to top-level institutions, such as universities. Supervised affiliation resolution requires a large number of annotated alignments between affiliation strings and known institutions, which are not readily available. We introduce the task of unsupervised hierarchical affiliation resolution, which assigns affiliations to institutions on all hierarchy levels (e.g. departments), discovering the institutions as well as their hierarchical ordering on the fly. From the corresponding requirements, we derive a simple conceptual framework based on the subset partial order that can be extended to account for the discrepancies evident in realistic affiliations from the Web of Science. We implement initial baselines and provide datasets and evaluation metrics for experimentation. Results show that mapping affiliations to known institutions and discovering lower-level institutions works well with simple baselines, whereas unsupervised top-level- and hierarchical resolution is more challenging. Our work provides structured guidance for further in-depth studies and improved methodology by identifying and discussing a number of observed difficulties and important challenges that future work needs to address

    Similar works