Search CORE

1,584 research outputs found

Whois? Deep Author Name Disambiguation using Bibliographic Data

Author: Bahubali Nagaraj Asundi
Boukhers Zeyd
Publication venue
Publication date: 24/07/2022
Field of study

As the number of authors is increasing exponentially over years, the number of authors sharing the same names is increasing proportionally. This makes it challenging to assign newly published papers to their adequate authors. Therefore, Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries. This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities by leveraging their co-authors and domain of research. To this end, we use a collection from the DBLP repository that contains more than 5 million bibliographic records authored by around 2.6 million co-authors. Our approach first groups authors who share the same last names and same first name initials. The author within each group is identified by capturing the relation with his/her co-authors and area of research, which is represented by the titles of the validated publications of the corresponding author. To this end, we train a neural network model that learns from the representations of the co-authors and titles. We validated the effectiveness of our approach by conducting extensive experiments on a large dataset.Comment: Accepted for publication @ TPDL202

arXiv.org e-Print Archive

Author identification in bibliographic data using deep neural networks

Author: Darmawahyuni Annisa
Firdaus Firdaus
Juliano Andre Herviant
Malik Reza Firsandaya
Nugraha Tio Artha
Nurmaini Siti
Putra Varindo Ockta Keneddi
Rachmatullah Muhammad Naufal
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/06/2021
Field of study

Author name disambiguation (AND) is a challenging task for scholars who mine bibliographic information for scientific knowledge. A constructive approach for resolving name ambiguity is to use computer algorithms to identify author names. Some algorithm-based disambiguation methods have been developed by computer and data scientists. Among them, supervised machine learning has been stated to produce decent to very accurate disambiguation results. This paper presents a combination of principal component analysis (PCA) as a feature reduction and deep neural networks (DNNs), as a supervised algorithm for classifying AND problems. The raw data is grouped into four classes, i.e., synonyms, homonyms, homonyms-synonyms, and non-homonyms-synonyms classification. We have taken into account several hyperparameters tuning, such as learning rate, batch size, number of the neuron and hidden units, and analyzed their impact on the accuracy of results. To the best of our knowledge, there are no previous studies with such a scheme. The proposed DNNs are validated with other ML techniques such as Naïve Bayes, random forest (RF), and support vector machine (SVM) to produce a good classifier. By exploring the result in all data, our proposed DNNs classifier has an outperformed other ML technique, with accuracy, precision, recall, and F1-score, which is 99.98%, 97.98%, 97.86%, and 99.99%, respectively. In the future, this approach can be easily extended to any dataset and any bibliographic records provider

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

Identifying Mis-Configured Author Profiles on Google Scholar Using Deep Learning

Author: Chen Yang
Hui Pan
Sha Kewei
She Guozhen
Tang Jiaxin
Wang Xin
Wang Yi
Xu Yang
Zhang Zhenhua
Publication venue
Publication date: 01/07/2021
Field of study

Google Scholar has been a widely used platform for academic performance evaluation and citation analysis. The issue about the mis-configuration of author profiles may seriously damage the reliability of the data, and thus affect the accuracy of analysis. Therefore, it is important to detect the mis-configured author profiles. Dealing with this issue is challenging because the scale of the dataset is large and manual annotation is time-consuming and relatively subjective. In this paper, we first collect a dataset of Google Scholar's author profiles in the field of computer science and compare the mis-configured author profiles with the reliable ones. Then, we propose an integrated model that utilizes machine learning and node embedding to automatically detect mis-configured author profiles. Additionally, we conduct two application case studies based on the data of Google Scholar, i.e., outstanding scholar searching and university ranking, to demonstrate how the improved dataset after filtering out the mis-configured author profiles will change the results. The two case studies validate the importance and meaningfulness of the detection of mis-configured author profiles.Peer reviewe

Directory of Open Access Journals

Helsingin yliopiston digitaalinen arkisto

Citations: Indicators of Quality? The Impact Fallacy

Author: Bornmann Lutz
Comins Jordan
Leydesdorff Loet
Milojević Staša
Publication venue
Publication date: 01/01/2016
Field of study

We argue that citation is a composed indicator: short-term citations can be considered as currency at the research front, whereas long-term citations can contribute to the codification of knowledge claims into concept symbols. Knowledge claims at the research front are more likely to be transitory and are therefore problematic as indicators of quality. Citation impact studies focus on short-term citation, and therefore tend to measure not epistemic quality, but involvement in current discourses in which contributions are positioned by referencing. We explore this argument using three case studies: (1) citations of the journal Soziale Welt as an example of a venue that tends not to publish papers at a research front, unlike, for example, JACS; (2) Robert Merton as a concept symbol across theories of citation; and (3) the Multi-RPYS ("Multi-Referenced Publication Year Spectroscopy") of the journals Scientometrics, Gene, and Soziale Welt. We show empirically that the measurement of "quality" in terms of citations can further be qualified: short-term citation currency at the research front can be distinguished from longer-term processes of incorporation and codification of knowledge claims into bodies of knowledge. The recently introduced Multi-RPYS can be used to distinguish between short-term and long-term impacts.Comment: accepted for publication in Frontiers in Research Metrics and Analysis; doi: 10.3389/frma.2016.0000

arXiv.org e-Print Archive

Directory of Open Access Journals

Frontiers - Publisher Connector

UvA-DARE

International Migration, Integration and Social Cohesion online publications

A knowledge graph embeddings based approach for author name disambiguation using literals

Author: Alam M.
Gangemi A.
Gesese G. A.
Peroni S.
Sack H.
Santini C.
Publication venue
Publication date: 01/01/2022
Field of study

Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the form of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: (1) multimodal KGEs, (2) a blocking procedure, and finally, (3) hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8–14% in terms of F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://doi.org/10.5281/zenodo.6309855) respectively

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

A knowledge graph embeddings based approach for author name disambiguation using literals

Author: Alam Mehwish
Gangemi Aldo
Gesese Genet Asefa
Peroni Silvio
Sack Harald
Santini Cristian
Publication venue: Springer Verlag
Publication date: 01/06/2022
Field of study

arXiv.org e-Print Archive

KITopen

A knowledge graph embeddings based approach for author name disambiguation using literals

Author: Alam Mehwish
Gangemi Aldo
Gesese Genet Asefa
Peroni Silvio
Sack Harald
Santini Cristian
Publication venue
Publication date: 27/07/2022
Field of study

KITopen