33,200 research outputs found
Study of Heterogeneous Academic Networks
Academic networks are derived from scholarly data. They are heterogeneous in the sense that different types of nodes are involved, such as papers and authors. This dissertation studies such heterogeneous networks for measuring the academic influence and learning vector representations of authors. Academic influence has been traditionally measured by the citation count and metrics derived from it. PageRank based algorithms have been used to give higher weight to citations from more influential papers. A better metric is to add authors into the citation network so that the importance of authors and papers are evaluated recursively within the same framework. Based on such heterogeneous academic networks, we propose a new algorithm for ranking authors. Tested on two large networks, we find that our method outperforms the other 10 methods in terms of the number of award winners among top-ranked authors. We further improve the method by finding and dealing with the long reference issue. Moreover, we find the mutual citation in paper networks and the self citation issue in author networks. Our new method can reduce the impact of the above three issues and identify more rising stars. To learn efficient author representations from heterogeneous academic networks, we propose a new embedding method called Stratified Embedding for Heterogeneous Networks (SEHN) based on Skip-Gram Negative Sampling (SGNS). We conduct Random Walks to generate the traces that represent the structure of the network, then separate the traces into different layers so that each layer contains the nodes of one type only. Such stratification improves embeddings that are derived from the mixed traces by a large margin. SEHN improves the state-of-the-art Metapath2vec by up to 24% at a certain point. The efficacy of stratification is also demonstrated on two classic network embedding algorithms DeepWalk and Node2vec. The results are validated in two heterogeneous networks. We also demonstrate that SEHN outperforms the embedding of homogeneous author networks that are induced from their corresponding heterogeneous networks
Applying weighted PageRank to author citation networks
This paper aims to identify whether different weighted PageRank algorithms
can be applied to author citation networks to measure the popularity and
prestige of a scholar from a citation perspective. Information Retrieval (IR)
was selected as a test field and data from 1956-2008 were collected from Web of
Science (WOS). Weighted PageRank with citation and publication as weighted
vectors were calculated on author citation networks. The results indicate that
both popularity rank and prestige rank were highly correlated with the weighted
PageRank. Principal Component Analysis (PCA) was conducted to detect
relationships among these different measures. For capturing prize winners
within the IR field, prestige rank outperformed all the other measures.Comment: 19 pages, 4 figures, 5 table
Science Models as Value-Added Services for Scholarly Information Systems
The paper introduces scholarly Information Retrieval (IR) as a further
dimension that should be considered in the science modeling debate. The IR use
case is seen as a validation model of the adequacy of science models in
representing and predicting structure and dynamics in science. Particular
conceptualizations of scholarly activity and structures in science are used as
value-added search services to improve retrieval quality: a co-word model
depicting the cognitive structure of a field (used for query expansion), the
Bradford law of information concentration, and a model of co-authorship
networks (both used for re-ranking search results). An evaluation of the
retrieval quality when science model driven services are used turned out that
the models proposed actually provide beneficial effects to retrieval quality.
From an IR perspective, the models studied are therefore verified as expressive
conceptualizations of central phenomena in science. Thus, it could be shown
that the IR perspective can significantly contribute to a better understanding
of scholarly structures and activities.Comment: 26 pages, to appear in Scientometric
A Systematic Identification and Analysis of Scientists on Twitter
Metrics derived from Twitter and other social media---often referred to as
altmetrics---are increasingly used to estimate the broader social impacts of
scholarship. Such efforts, however, may produce highly misleading results, as
the entities that participate in conversations about science on these platforms
are largely unknown. For instance, if altmetric activities are generated mainly
by scientists, does it really capture broader social impacts of science? Here
we present a systematic approach to identifying and analyzing scientists on
Twitter. Our method can identify scientists across many disciplines, without
relying on external bibliographic data, and be easily adapted to identify other
stakeholder groups in science. We investigate the demographics, sharing
behaviors, and interconnectivity of the identified scientists. We find that
Twitter has been employed by scholars across the disciplinary spectrum, with an
over-representation of social and computer and information scientists;
under-representation of mathematical, physical, and life scientists; and a
better representation of women compared to scholarly publishing. Analysis of
the sharing of URLs reveals a distinct imprint of scholarly sites, yet only a
small fraction of shared URLs are science-related. We find an assortative
mixing with respect to disciplines in the networks between scientists,
suggesting the maintenance of disciplinary walls in social media. Our work
contributes to the literature both methodologically and conceptually---we
provide new methods for disambiguating and identifying particular actors on
social media and describing the behaviors of scientists, thus providing
foundational information for the construction and use of indicators on the
basis of social media metrics
Will This Paper Increase Your h-index? Scientific Impact Prediction
Scientific impact plays a central role in the evaluation of the output of
scholars, departments, and institutions. A widely used measure of scientific
impact is citations, with a growing body of literature focused on predicting
the number of citations obtained by any given publication. The effectiveness of
such predictions, however, is fundamentally limited by the power-law
distribution of citations, whereby publications with few citations are
extremely common and publications with many citations are relatively rare.
Given this limitation, in this work we instead address a related question asked
by many academic researchers in the course of writing a paper, namely: "Will
this paper increase my h-index?" Using a real academic dataset with over 1.7
million authors, 2 million papers, and 8 million citation relationships from
the premier online academic service ArnetMiner, we formalize a novel scientific
impact prediction problem to examine several factors that can drive a paper to
increase the primary author's h-index. We find that the researcher's authority
on the publication topic and the venue in which the paper is published are
crucial factors to the increase of the primary author's h-index, while the
topic popularity and the co-authors' h-indices are of surprisingly little
relevance. By leveraging relevant factors, we find a greater than 87.5%
potential predictability for whether a paper will contribute to an author's
h-index within five years. As a further experiment, we generate a
self-prediction for this paper, estimating that there is a 76% probability that
it will contribute to the h-index of the co-author with the highest current
h-index in five years. We conclude that our findings on the quantification of
scientific impact can help researchers to expand their influence and more
effectively leverage their position of "standing on the shoulders of giants."Comment: Proc. of the 8th ACM International Conference on Web Search and Data
Mining (WSDM'15
- …