7 research outputs found

    Neural network technique with deep structure for improving author homonym and synonym classification in digital libraries

    Get PDF
    Author name disambiguation (AND), also recognized as name-identification, has long been seen as a challenging issue in bibliographic data. In other words, the same author may appear under separate names, synonyms, or distinct authors may have similar to those referred to as homonyms. Some previous research has proposed AND problem. To the best of our knowledge, no study discussed specifically synonym and homonym, whereas such cases are the core in AND topic. This paper presents the classification of non-homonym-synonym, homonym-synonym, synonym, and homonym cases by using the DBLP computer science bibliography dataset. Based on the DBLP raw data, the classification process is proposed by using deep neural networks (DNNs). In the classification process, the DBLP raw data divided into five features, including name, author, title, venue, and year. Twelve scenarios are designed with a different structure to validate and select the best model of DNNs. Furthermore, this paper is also compared DNNs with other classifiers, such as support vector machine (SVM) and decision tree. The results show DNNs outperform SVM and decision tree methods in all performance metrics. The DNNs performances with three hidden layers as the best model, achieve accuracy, sensitivity, specificity, precision, and F1-score are 98.85%, 95.95%, 99.26%, 94.80%, and 95.36%, respectively. In the future, DNNs are more performing with the automated feature representation in AND processing

    Author disambiguation using multi-aspect similarity indicators

    Get PDF
    Key to accurate bibliometric analyses is the ability to correctly link individuals to their corpus of work, with an optimal balance between precision and recall. We have developed an algorithm that does this disambiguation task with a very high recall and precision. The method addresses the issues of discarded records due to null data fields and their resultant effect on recall, precision and F-measure results. We have implemented a dynamic approach to similarity calculations based on all available data fields. We have also included differences in author contribution and age difference between publications, both of which have meaningful effects on overall similarity measurements, resulting in significantly higher recall and precision of returned records. The results are presented from a test dataset of heterogeneous catalysis publications. Results demonstrate significantly high average F-measure scores and substantial improvements on previous and stand-alone techniques

    Authors semantic disambiguation on heterogeneous bibliographic sources

    Get PDF
    Data ambiguity from various sources remains as a complex problem that affects services provided by digital libraries. From the point of view of integration of information from different sources, the challenge of author ambiguity is one of the most important, and there are numerous methods proposed to deal with this issue using different approaches. They generally work for some scenarios but they have important limitations, specially when dealing with heterogeneous sources. In this work, we review a group of existing methods and then propose a technique that combines some of them, also incorporating a measure of distance using semantic technologies to solve the ambiguity of authors while integrating bibliographic data from various sources. This technique has been successfully tested in disambiguating Ecuadorian authors from both internal sources (institutional repositories) and external digital libraries.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Factors Associating with the Future Citation Impact of Published Articles: A Statistical Modelling Approach

    Get PDF
    A thesis submitted in partial fulfilment of the Requirements of the University of Wolverhampton For the degree of Doctor of Philosophy.This study investigates a range of metrics available when an article is published to see which metrics associate with its eventual citation count. The purposes are to contribute to developing a citation model and to inform policymakers about which predictor variables associate with citations in different fields of science. Despite the complex nature of reasons for citation, some attributes of a paper’s authors, journal, references, abstract, field, country and institutional affiliations, and funding source are known to associate with its citation impact. This thesis investigates some common factors previously assessed and some new factors: journal author internationality; journal citing author internationality; cited journal author internationality; cited journal citing author internationality; impact of the author(s), publishing journal, affiliated institution, and affiliated country; length of paper; abstract and title; number of references; size of the field; number of authors, institutions and countries; abstract readability; and research funding. A sample of articles and proceedings papers in the 22 Essential Science Indicators subject fields from the Web of Science constitute the research data set. Using negative binomial hurdle models, this study simultaneously assesses the above factors using large scale data. The study found very similar behaviours across subject categories and broad areas in terms of factors associating with more citations. Journal and reference factors are the most effective determinants of future citation counts in most subject domains. Individual and international teamwork give a citation advantage in majority of subject areas but inter-institutional teamwork seems not to contribute to citation impact
    corecore