4 research outputs found

    Evaluaci贸n del algoritmo de desambiguaci贸n de autores de AMiner en un metabuscador acad茅mico de Ciencias de la Computaci贸n

    Get PDF
    Author disambiguation is a problem of considerable relevance for academic information retrieval systems. A name disambiguation algorithm implemented in AMiner represents one of the approaches based on Machine Learning with the greatest impact in the present. This work presents an evaluation of the name disambiguation algorithm of AMiner for author disambiguation in the context of an academic metasearcher of Computer Science. Experimental results with data generated by the academic metasearcher indicate an average performance similar to the reference. Likewise, experimental results led to identify special cases of author names that present low performance compared with the average. This finding allowed the identification of an apparent association between the low performance of the algorithm in the context of several authors with the same name and with a low number of publications.La desambiguaci贸n de autores es un problema de gran relevancia para los sistemas de recuperaci贸n de informaci贸n del 谩mbito acad茅mico. El algoritmo de desambiguaci贸n de nombres de AMiner representa uno de los enfoques basados en Aprendizaje Autom谩tico con mayor impacto en la actualidad. En este trabajo, se presenta una evaluaci贸n del algoritmo de desambiguaci贸n de nombres de AMiner para la desambiguaci贸n de autores en el contexto de un metabuscador acad茅mico del 谩rea de las Ciencias de la Computaci贸n. Los resultados experimentales con datos generados por el metabuscador acad茅mico dan cuenta de un desempe帽o promedio similar a la referencia. Asimismo, las evaluaciones experimentales permitieron identificar casos especiales de nombres de autores en el que el algoritmo presenta un bajo desempe帽o en comparaci贸n con el promedio. Este hallazgo permiti贸 identificar una asociaci贸n aparente entre el bajo desempe帽o del algoritmo en contextos de varios autores con un mismo nombre y con pocas publicaciones

    How reliable are unsupervised author disambiguation algorithms in the assessment of research organization performance?

    Get PDF
    The paper examines extent of bias in the performance rankings of research organisations when the assessments are based on unsupervised author-name disambiguation algorithms. It compares the outcomes of a research performance evaluation exercise of Italian universities using the unsupervised approach by Caron and van Eck (2014) for derivation of the universities' research staff, with those of a benchmark using the supervised algorithm of D'Angelo, Giuffrida, and Abramo (2011), which avails of input data. The methodology developed could be replicated for comparative analyses in other frameworks of national or international interest, meaning that practitioners would have a precise measure of the extent of distortions inherent in any evaluation exercises using unsupervised algorithms. This could in turn be useful in informing policy-makers' decisions on whether to invest in building national research staff databases, instead of settling for the unsupervised approaches with their measurement biases

    Effect of forename string on author name disambiguation

    Full text link
    In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performance of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled data sets under varying ratios and lengths of full forenames, reflecting real鈥恮orld scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). The results show that increasing the ratios of full forenames substantially improves both heuristic and machine鈥恖earning鈥恇ased disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonyms are prevalent. As the ratios of full forenames increase, however, they become marginal compared to those by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation methods compared to using full鈥恖ength strings. These findings provide practical suggestions, such as restoring initialized forenames into a full鈥恠tring format via record linkage for improved disambiguation performances.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/155924/1/asi24298.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/155924/2/asi24298_am.pd

    Gender differences in science

    Get PDF
    corecore