1,030 research outputs found
Name Disambiguation from link data in a collaboration graph using temporal and topological features
In a social community, multiple persons may share the same name, phone number
or some other identifying attributes. This, along with other phenomena, such as
name abbreviation, name misspelling, and human error leads to erroneous
aggregation of records of multiple persons under a single reference. Such
mistakes affect the performance of document retrieval, web search, database
integration, and more importantly, improper attribution of credit (or blame).
The task of entity disambiguation partitions the records belonging to multiple
persons with the objective that each decomposed partition is composed of
records of a unique person. Existing solutions to this task use either
biographical attributes, or auxiliary features that are collected from external
sources, such as Wikipedia. However, for many scenarios, such auxiliary
features are not available, or they are costly to obtain. Besides, the attempt
of collecting biographical or external data sustains the risk of privacy
violation. In this work, we propose a method for solving entity disambiguation
task from link information obtained from a collaboration network. Our method is
non-intrusive of privacy as it uses only the time-stamped graph topology of an
anonymized network. Experimental results on two real-life academic
collaboration networks show that the proposed method has satisfactory
performance.Comment: The short version of this paper has been accepted to ASONAM 201
Exploiting citation networks for large-scale author name disambiguation
We present a novel algorithm and validation method for disambiguating author
names in very large bibliographic data sets and apply it to the full Web of
Science (WoS) citation index. Our algorithm relies only upon the author and
citation graphs available for the whole period covered by the WoS. A pair-wise
publication similarity metric, which is based on common co-authors,
self-citations, shared references and citations, is established to perform a
two-step agglomerative clustering that first connects individual papers and
then merges similar clusters. This parameterized model is optimized using an
h-index based recall measure, favoring the correct assignment of well-cited
publications, and a name-initials-based precision using WoS metadata and
cross-referenced Google Scholar profiles. Despite the use of limited metadata,
we reach a recall of 87% and a precision of 88% with a preference for
researchers with high h-index values. 47 million articles of WoS can be
disambiguated on a single machine in less than a day. We develop an h-index
distribution model, confirming that the prediction is in excellent agreement
with the empirical data, and yielding insight into the utility of the h-index
in real academic ranking scenarios.Comment: 14 pages, 5 figure
The Effect of Gender in the Publication Patterns in Mathematics
Despite the increasing number of women graduating in mathematics, a systemic
gender imbalance persists and is signified by a pronounced gender gap in the
distribution of active researchers and professors. Especially at the level of
university faculty, women mathematicians continue being drastically
underrepresented, decades after the first affirmative action measures have been
put into place. A solid publication record is of paramount importance for
securing permanent positions. Thus, the question arises whether the publication
patterns of men and women mathematicians differ in a significant way. Making
use of the zbMATH database, one of the most comprehensive metadata sources on
mathematical publications, we analyze the scholarly output of ~150,000
mathematicians from the past four decades whose gender we algorithmically
inferred. We focus on development over time, collaboration through
coautorships, presumed journal quality and distribution of research topics --
factors known to have a strong impact on job perspectives. We report
significant differences between genders which may put women at a disadvantage
when pursuing an academic career in mathematics.Comment: 24 pages, 12 figure
Identifying Geographic Clusters: A Network Analytic Approach
In recent years there has been a growing interest in the role of networks and
clusters in the global economy. Despite being a popular research topic in
economics, sociology and urban studies, geographical clustering of human
activity has often studied been by means of predetermined geographical units
such as administrative divisions and metropolitan areas. This approach is
intrinsically time invariant and it does not allow one to differentiate between
different activities. Our goal in this paper is to present a new methodology
for identifying clusters, that can be applied to different empirical settings.
We use a graph approach based on k-shell decomposition to analyze world
biomedical research clusters based on PubMed scientific publications. We
identify research institutions and locate their activities in geographical
clusters. Leading areas of scientific production and their top performing
research institutions are consistently identified at different geographic
scales
Scaleâfree collaboration networks: An author name disambiguation perspective
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/149559/1/asi24158.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/149559/2/asi24158_am.pd
- âŠ