17,834 research outputs found
A critical cluster analysis of 44 indicators of author-level performance
This paper explores the relationship between author-level bibliometric
indicators and the researchers the "measure", exemplified across five academic
seniorities and four disciplines. Using cluster methodology, the disciplinary
and seniority appropriateness of author-level indicators is examined.
Publication and citation data for 741 researchers across Astronomy,
Environmental Science, Philosophy and Public Health was collected in Web of
Science (WoS). Forty-four indicators of individual performance were computed
using the data. A two-step cluster analysis using IBM SPSS version 22 was
performed, followed by a risk analysis and ordinal logistic regression to
explore cluster membership. Indicator scores were contextualized using the
individual researcher's curriculum vitae. Four different clusters based on
indicator scores ranked researchers as low, middle, high and extremely high
performers. The results show that different indicators were appropriate in
demarcating ranked performance in different disciplines. In Astronomy the h2
indicator, sum pp top prop in Environmental Science, Q2 in Philosophy and
e-index in Public Health. The regression and odds analysis showed individual
level indicator scores were primarily dependent on the number of years since
the researcher's first publication registered in WoS, number of publications
and number of citations. Seniority classification was secondary therefore no
seniority appropriate indicators were confidently identified. Cluster
methodology proved useful in identifying disciplinary appropriate indicators
providing the preliminary data preparation was thorough but needed to be
supplemented by other analyses to validate the results. A general disconnection
between the performance of the researcher on their curriculum vitae and the
performance of the researcher based on bibliometric indicators was observed.Comment: 28 pages, 7 tables, 2 figures, 2 appendice
Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks
Heterogeneous information networks (HINs) are ubiquitous in real-world
applications. In the meantime, network embedding has emerged as a convenient
tool to mine and learn from networked data. As a result, it is of interest to
develop HIN embedding methods. However, the heterogeneity in HINs introduces
not only rich information but also potentially incompatible semantics, which
poses special challenges to embedding learning in HINs. With the intention to
preserve the rich yet potentially incompatible information in HIN embedding, we
propose to study the problem of comprehensive transcription of heterogeneous
information networks. The comprehensive transcription of HINs also provides an
easy-to-use approach to unleash the power of HINs, since it requires no
additional supervision, expertise, or feature engineering. To cope with the
challenges in the comprehensive transcription of HINs, we propose the HEER
algorithm, which embeds HINs via edge representations that are further coupled
with properly-learned heterogeneous metrics. To corroborate the efficacy of
HEER, we conducted experiments on two large-scale real-words datasets with an
edge reconstruction task and multiple case studies. Experiment results
demonstrate the effectiveness of the proposed HEER model and the utility of
edge representations and heterogeneous metrics. The code and data are available
at https://github.com/GentleZhu/HEER.Comment: 10 pages. In Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, London, United Kingdom,
ACM, 201
Recommended from our members
Quantifying preferential trading in the e-MID interbank market
Interbank markets allow credit institutions to exchange capital for purposes of liquidity management. These markets are among the most liquid markets in the financial system. However, liquidity of interbank markets dropped during the 2007-2008 financial crisis, and such a lack of liquidity influenced the entire economic system. In this paper, we analyze transaction data from the e-MID market which is the only electronic interbank market in the Euro Area and US, over a period of eleven years (1999-2009). We adapt a method developed to detect statistically validated links in a network, in order to reveal preferential trading in a directed network. Preferential trading between banks is detected by comparing empirically observed trading relationships with a null hypothesis that assumes random trading among banks doing a heterogeneous number of transactions. Preferential trading patterns are revealed at time windows of 3-maintenance periods. We show that preferential trading is observed throughout the whole period of analysis and that the number of preferential trading links does not show any significant trend in time, in spite of a decreasing trend in the number of pairs of banks making transactions. We observe that preferential trading connections typically involve large trading volumes. During the crisis, we also observe that transactions occurring between banks with a preferential connection occur at larger interest rates than the complement set - an effect that is not observed before the crisis
A multi-class approach for ranking graph nodes: models and experiments with incomplete data
After the phenomenal success of the PageRank algorithm, many researchers have
extended the PageRank approach to ranking graphs with richer structures beside
the simple linkage structure. In some scenarios we have to deal with
multi-parameters data where each node has additional features and there are
relationships between such features.
This paper stems from the need of a systematic approach when dealing with
multi-parameter data. We propose models and ranking algorithms which can be
used with little adjustments for a large variety of networks (bibliographic
data, patent data, twitter and social data, healthcare data). In this paper we
focus on several aspects which have not been addressed in the literature: (1)
we propose different models for ranking multi-parameters data and a class of
numerical algorithms for efficiently computing the ranking score of such
models, (2) by analyzing the stability and convergence properties of the
numerical schemes we tune a fast and stable technique for the ranking problem,
(3) we consider the issue of the robustness of our models when data are
incomplete. The comparison of the rank on the incomplete data with the rank on
the full structure shows that our models compute consistent rankings whose
correlation is up to 60% when just 10% of the links of the attributes are
maintained suggesting the suitability of our model also when the data are
incomplete
Will This Paper Increase Your h-index? Scientific Impact Prediction
Scientific impact plays a central role in the evaluation of the output of
scholars, departments, and institutions. A widely used measure of scientific
impact is citations, with a growing body of literature focused on predicting
the number of citations obtained by any given publication. The effectiveness of
such predictions, however, is fundamentally limited by the power-law
distribution of citations, whereby publications with few citations are
extremely common and publications with many citations are relatively rare.
Given this limitation, in this work we instead address a related question asked
by many academic researchers in the course of writing a paper, namely: "Will
this paper increase my h-index?" Using a real academic dataset with over 1.7
million authors, 2 million papers, and 8 million citation relationships from
the premier online academic service ArnetMiner, we formalize a novel scientific
impact prediction problem to examine several factors that can drive a paper to
increase the primary author's h-index. We find that the researcher's authority
on the publication topic and the venue in which the paper is published are
crucial factors to the increase of the primary author's h-index, while the
topic popularity and the co-authors' h-indices are of surprisingly little
relevance. By leveraging relevant factors, we find a greater than 87.5%
potential predictability for whether a paper will contribute to an author's
h-index within five years. As a further experiment, we generate a
self-prediction for this paper, estimating that there is a 76% probability that
it will contribute to the h-index of the co-author with the highest current
h-index in five years. We conclude that our findings on the quantification of
scientific impact can help researchers to expand their influence and more
effectively leverage their position of "standing on the shoulders of giants."Comment: Proc. of the 8th ACM International Conference on Web Search and Data
Mining (WSDM'15
Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement
The problem of identifying the optimal location for a new retail store has
been the focus of past research, especially in the field of land economy, due
to its importance in the success of a business. Traditional approaches to the
problem have factored in demographics, revenue and aggregated human flow
statistics from nearby or remote areas. However, the acquisition of relevant
data is usually expensive. With the growth of location-based social networks,
fine grained data describing user mobility and popularity of places has
recently become attainable.
In this paper we study the predictive power of various machine learning
features on the popularity of retail stores in the city through the use of a
dataset collected from Foursquare in New York. The features we mine are based
on two general signals: geographic, where features are formulated according to
the types and density of nearby places, and user mobility, which includes
transitions between venues or the incoming flow of mobile users from distant
areas. Our evaluation suggests that the best performing features are common
across the three different commercial chains considered in the analysis,
although variations may exist too, as explained by heterogeneities in the way
retail facilities attract users. We also show that performance improves
significantly when combining multiple features in supervised learning
algorithms, suggesting that the retail success of a business may depend on
multiple factors.Comment: Proceedings of the 19th ACM SIGKDD international conference on
Knowledge discovery and data mining, Chicago, 2013, Pages 793-80
Ranking users, papers and authors in online scientific communities
The ever-increasing quantity and complexity of scientific production have
made it difficult for researchers to keep track of advances in their own
fields. This, together with growing popularity of online scientific
communities, calls for the development of effective information filtering
tools. We propose here a method to simultaneously compute reputation of users
and quality of scientific artifacts in an online scientific community.
Evaluation on artificially-generated data and real data from the Econophysics
Forum is used to determine the method's best-performing variants. We show that
when the method is extended by considering author credit, its performance
improves on multiple levels. In particular, top papers have higher citation
count and top authors have higher -index than top papers and top authors
chosen by other algorithms.Comment: 7 pages, 3 figures, 3 table
- …