Search CORE

1,030 research outputs found

Name Disambiguation from link data in a collaboration graph using temporal and topological features

Author: Hasan Mohammad Al
Saha Tanay Kumar
Zhang Baichuan
Publication venue
Publication date: 01/12/2015
Field of study

In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error leads to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and more importantly, improper attribution of credit (or blame). The task of entity disambiguation partitions the records belonging to multiple persons with the objective that each decomposed partition is composed of records of a unique person. Existing solutions to this task use either biographical attributes, or auxiliary features that are collected from external sources, such as Wikipedia. However, for many scenarios, such auxiliary features are not available, or they are costly to obtain. Besides, the attempt of collecting biographical or external data sustains the risk of privacy violation. In this work, we propose a method for solving entity disambiguation task from link information obtained from a collaboration network. Our method is non-intrusive of privacy as it uses only the time-stamped graph topology of an anonymized network. Experimental results on two real-life academic collaboration networks show that the proposed method has satisfactory performance.Comment: The short version of this paper has been accepted to ASONAM 201

arXiv.org e-Print Archive

IUPUIScholarWorks

Exploiting citation networks for large-scale author name disambiguation

Author: Helbing Dirk
Mazloumian Amin
Penner Orion
Petersen Alexander M
Schulz Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared references and citations, is established to perform a two-step agglomerative clustering that first connects individual papers and then merges similar clusters. This parameterized model is optimized using an h-index based recall measure, favoring the correct assignment of well-cited publications, and a name-initials-based precision using WoS metadata and cross-referenced Google Scholar profiles. Despite the use of limited metadata, we reach a recall of 87% and a precision of 88% with a preference for researchers with high h-index values. 47 million articles of WoS can be disambiguated on a single machine in less than a day. We develop an h-index distribution model, confirming that the prediction is in excellent agreement with the empirical data, and yielding insight into the utility of the h-index in real academic ranking scenarios.Comment: 14 pages, 5 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Repository for Publications and Research Data

Springer - Publisher Connector

eScholarship - University of California

IMT Institutional Repository

The Effect of Gender in the Publication Patterns in Mathematics

Author: Mihaljević-Brandt Helena
Santamaría Lucía
Tullney Marco
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Despite the increasing number of women graduating in mathematics, a systemic gender imbalance persists and is signified by a pronounced gender gap in the distribution of active researchers and professors. Especially at the level of university faculty, women mathematicians continue being drastically underrepresented, decades after the first affirmative action measures have been put into place. A solid publication record is of paramount importance for securing permanent positions. Thus, the question arises whether the publication patterns of men and women mathematicians differ in a significant way. Making use of the zbMATH database, one of the most comprehensive metadata sources on mathematical publications, we analyze the scholarly output of ~150,000 mathematicians from the past four decades whose gender we algorithmically inferred. We focus on development over time, collaboration through coautorships, presumed journal quality and distribution of research topics -- factors known to have a strong impact on job perspectives. We report significant differences between genders which may put women at a disadvantage when pursuing an academic career in mathematics.Comment: 24 pages, 12 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Repositorium für Naturwissenschaften und Technik

Identifying Geographic Clusters: A Network Analytic Approach

Author: Catini Roberto
Karamshuk Dmytro
Penner Orion
Riccaboni Massimo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

In recent years there has been a growing interest in the role of networks and clusters in the global economy. Despite being a popular research topic in economics, sociology and urban studies, geographical clustering of human activity has often studied been by means of predetermined geographical units such as administrative divisions and metropolitan areas. This approach is intrinsically time invariant and it does not allow one to differentiate between different activities. Our goal in this paper is to present a new methodology for identifying clusters, that can be applied to different empirical settings. We use a graph approach based on k-shell decomposition to analyze world biomedical research clusters based on PubMed scientific publications. We identify research institutions and locate their activities in geographical clusters. Leading areas of scientific production and their top performing research institutions are consistently identified at different geographic scales

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Munich RePEc Personal Archive

Crossref

Archivio della ricerca della Scuola IMT Alti Studi Lucca

King's Research Portal

IMT Institutional Repository

Scale‐free collaboration networks: An author name disambiguation perspective

Author: Kim Jinseok
Publication venue: 'Wiley'
Publication date: 07/11/2018
Field of study

Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/149559/1/asi24158.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/149559/2/asi24158_am.pd

arXiv.org e-Print Archive

Crossref

Deep Blue Documents at the University of Michigan