285 research outputs found

    Scale‐free collaboration networks: An author name disambiguation perspective

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/149559/1/asi24158.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/149559/2/asi24158_am.pd

    Improving co-authorship network structures by combining multiple data sources: evidence from Italian academic statisticians

    Get PDF
    The aim of the present contribution is to merge bibliographic data for members of a bounded scientific community in order to derive a complete unified archive, with top-international and nationally oriented production, as a new basis to carry out network analysis on a unified co-authorship network. A two-step procedure is used to deal with the identification of duplicate records and the author name disambiguation. Specifically, for the second step we strongly drew inspiration from a well-established unsupervised disambiguation method proposed in the literature following a network-based approach and requiring a restricted set of record attributes. Evidences from Italian academic statisticians were provided by merging data from three bibliographic archives. Non-negligible differences were observed in network results in the comparison of disambiguated and not disambiguated data sets, especially in network measures at individual level

    Author identification in bibliographic data using deep neural networks

    Get PDF
    Author name disambiguation (AND) is a challenging task for scholars who mine bibliographic information for scientific knowledge. A constructive approach for resolving name ambiguity is to use computer algorithms to identify author names. Some algorithm-based disambiguation methods have been developed by computer and data scientists. Among them, supervised machine learning has been stated to produce decent to very accurate disambiguation results. This paper presents a combination of principal component analysis (PCA) as a feature reduction and deep neural networks (DNNs), as a supervised algorithm for classifying AND problems. The raw data is grouped into four classes, i.e., synonyms, homonyms, homonyms-synonyms, and non-homonyms-synonyms classification. We have taken into account several hyperparameters tuning, such as learning rate, batch size, number of the neuron and hidden units, and analyzed their impact on the accuracy of results. To the best of our knowledge, there are no previous studies with such a scheme. The proposed DNNs are validated with other ML techniques such as Naïve Bayes, random forest (RF), and support vector machine (SVM) to produce a good classifier. By exploring the result in all data, our proposed DNNs classifier has an outperformed other ML technique, with accuracy, precision, recall, and F1-score, which is 99.98%, 97.98%, 97.86%, and 99.99%, respectively. In the future, this approach can be easily extended to any dataset and any bibliographic records provider

    Harnessing Historical Corrections to build Test Collections for Named Entity Disambiguation

    Full text link
    Matching mentions of persons to the actual persons (the name disambiguation problem) is central for several digital library applications. Scientists have been working on algorithms to create this matching for decades without finding a universal solution. One problem is that test collections for this problem are often small and specific to a certain collection. In this work, we present an approach that can create large test collections from historical metadata with minimal extra cost. We apply this approach to the DBLP collection to generate two freely available test collections. One collection focuses on the properties of defects and one on the evaluation of disambiguation algorithms.Comment: Preprint of a paper accepted at TPDL 201

    Exploiting citation networks for large-scale author name disambiguation

    Get PDF
    We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared references and citations, is established to perform a two-step agglomerative clustering that first connects individual papers and then merges similar clusters. This parameterized model is optimized using an h-index based recall measure, favoring the correct assignment of well-cited publications, and a name-initials-based precision using WoS metadata and cross-referenced Google Scholar profiles. Despite the use of limited metadata, we reach a recall of 87% and a precision of 88% with a preference for researchers with high h-index values. 47 million articles of WoS can be disambiguated on a single machine in less than a day. We develop an h-index distribution model, confirming that the prediction is in excellent agreement with the empirical data, and yielding insight into the utility of the h-index in real academic ranking scenarios.Comment: 14 pages, 5 figure

    Identification of Indonesian Authors Using Deep Neural Networks

    Get PDF
    Author Name Disambiguation (AND) is a problem that occurs when a set of publications contains ambiguous names of authors, i.e. the same author may appear with different names (synonyms) in other published papers, or author (authors) who may be different who may have the same name (homonym). In this final project, we will design a model with a Deep Neural Network (DNN) classifier. The dataset used in this final project uses primary data sourced from the Scopus website. This research focuses on integrating data from Indonesian authors. Parameters accuracy, sensitivity and precision are standard benchmarks to determine the performance of the method used to solve AND problems. The best DNN classification model achieves 99.9936% Accuracy, 93.1433% Sensitivity, 94.3733% Precision. Then for the highest performance measurement, the case of Non Synonym-Homonym (SH) has 99.9967% Accuracy, 96.7388% Sensitivity, and 97.5102% Precision

    Author Name Disambiguation via Heterogeneous Network Embedding from Structural and Semantic Perspectives

    Full text link
    Name ambiguity is common in academic digital libraries, such as multiple authors having the same name. This creates challenges for academic data management and analysis, thus name disambiguation becomes necessary. The procedure of name disambiguation is to divide publications with the same name into different groups, each group belonging to a unique author. A large amount of attribute information in publications makes traditional methods fall into the quagmire of feature selection. These methods always select attributes artificially and equally, which usually causes a negative impact on accuracy. The proposed method is mainly based on representation learning for heterogeneous networks and clustering and exploits the self-attention technology to solve the problem. The presentation of publications is a synthesis of structural and semantic representations. The structural representation is obtained by meta-path-based sampling and a skip-gram-based embedding method, and meta-path level attention is introduced to automatically learn the weight of each feature. The semantic representation is generated using NLP tools. Our proposal performs better in terms of name disambiguation accuracy compared with baselines and the ablation experiments demonstrate the improvement by feature selection and the meta-path level attention in our method. The experimental results show the superiority of our new method for capturing the most attributes from publications and reducing the impact of redundant information
    corecore