Search CORE

188 research outputs found

A knowledge graph embeddings based approach for author name disambiguation using literals

Author: Alam M.
Gangemi A.
Gesese G. A.
Peroni S.
Sack H.
Santini C.
Publication venue
Publication date: 01/01/2022
Field of study

Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the form of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: (1) multimodal KGEs, (2) a blocking procedure, and finally, (3) hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8–14% in terms of F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://doi.org/10.5281/zenodo.6309855) respectively

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Identification of Indonesian Authors Using Deep Neural Networks

Author: Afrina Mira
Darmawahyuni Annisa
Fachrurrozi Muhammad
Fahreza Irvan
Firdaus Firdaus
Lestari Suci Dwi
Nurmaini Siti
Putra Bayu Wijaya
Rachmatullah Muhammad Naufal
Sapitri Ade Iriani
Publication venue: 'Faculty of Computer Science, Sriwijaya University'
Publication date: 01/02/2022
Field of study

Author Name Disambiguation (AND) is a problem that occurs when a set of publications contains ambiguous names of authors, i.e. the same author may appear with different names (synonyms) in other published papers, or author (authors) who may be different who may have the same name (homonym). In this final project, we will design a model with a Deep Neural Network (DNN) classifier. The dataset used in this final project uses primary data sourced from the Scopus website. This research focuses on integrating data from Indonesian authors. Parameters accuracy, sensitivity and precision are standard benchmarks to determine the performance of the method used to solve AND problems. The best DNN classification model achieves 99.9936% Accuracy, 93.1433% Sensitivity, 94.3733% Precision. Then for the highest performance measurement, the case of Non Synonym-Homonym (SH) has 99.9967% Accuracy, 96.7388% Sensitivity, and 97.5102% Precision

ComEngApp-Journal

Deep Neural Network Structure to Improve Individual Performance based Author Classification

Author: Afrina Mira
Anshori Muhammad
Firdaus Firdaus
Nurmaini Siti
Raflesia Sarifah Putri
Zarkasi Ahmad
Publication venue: 'Faculty of Computer Science, Sriwijaya University'
Publication date: 01/02/2019
Field of study

This paper proposed an improved method for author name disambiguation problem, both homonym and synonym. The data prepared is the distance data of each pair of author’s attributes, Levenshtein distance are used. Using Deep Neural Networks, we found large gains on performance. The result shows that level of accuracy is 99.6% with a low number of hidden layer

ComEngApp-Journal

Directory of Open Access Journals

Identifying Mis-Configured Author Profiles on Google Scholar Using Deep Learning

Author: Chen Yang
Hui Pan
Sha Kewei
She Guozhen
Tang Jiaxin
Wang Xin
Wang Yi
Xu Yang
Zhang Zhenhua
Publication venue
Publication date: 01/07/2021
Field of study

Google Scholar has been a widely used platform for academic performance evaluation and citation analysis. The issue about the mis-configuration of author profiles may seriously damage the reliability of the data, and thus affect the accuracy of analysis. Therefore, it is important to detect the mis-configured author profiles. Dealing with this issue is challenging because the scale of the dataset is large and manual annotation is time-consuming and relatively subjective. In this paper, we first collect a dataset of Google Scholar's author profiles in the field of computer science and compare the mis-configured author profiles with the reliable ones. Then, we propose an integrated model that utilizes machine learning and node embedding to automatically detect mis-configured author profiles. Additionally, we conduct two application case studies based on the data of Google Scholar, i.e., outstanding scholar searching and university ranking, to demonstrate how the improved dataset after filtering out the mis-configured author profiles will change the results. The two case studies validate the importance and meaningfulness of the detection of mis-configured author profiles.Peer reviewe

Directory of Open Access Journals

Helsingin yliopiston digitaalinen arkisto

Effect of forename string on author name disambiguation

Author: Kim Jenna
Kim Jinseok
Publication venue: 'Wiley'
Publication date: 01/07/2020
Field of study

In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performance of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled data sets under varying ratios and lengths of full forenames, reflecting real‐world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). The results show that increasing the ratios of full forenames substantially improves both heuristic and machine‐learning‐based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonyms are prevalent. As the ratios of full forenames increase, however, they become marginal compared to those by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation methods compared to using full‐length strings. These findings provide practical suggestions, such as restoring initialized forenames into a full‐string format via record linkage for improved disambiguation performances.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/155924/1/asi24298.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/155924/2/asi24298_am.pd

arXiv.org e-Print Archive