35 research outputs found
Identifying Geographic Clusters: A Network Analytic Approach
In recent years there has been a growing interest in the role of networks and
clusters in the global economy. Despite being a popular research topic in
economics, sociology and urban studies, geographical clustering of human
activity has often studied been by means of predetermined geographical units
such as administrative divisions and metropolitan areas. This approach is
intrinsically time invariant and it does not allow one to differentiate between
different activities. Our goal in this paper is to present a new methodology
for identifying clusters, that can be applied to different empirical settings.
We use a graph approach based on k-shell decomposition to analyze world
biomedical research clusters based on PubMed scientific publications. We
identify research institutions and locate their activities in geographical
clusters. Leading areas of scientific production and their top performing
research institutions are consistently identified at different geographic
scales
Inequality and cumulative advantage in science careers: a case study of high-impact journals
Analyzing a large data set of publications drawn from the most competitive journals in the natural and social sciences we show that research careers exhibit the broad distributions of individual achievement characteristic of systems in which cumulative advantage plays a key role. While most researchers are personally aware of the competition implicit in the publication process, little is known about the levels of inequality at the level of individual researchers. Here we analyzed both productivity and impact measures for a large set of researchers publishing in high-impact journals, accounting for censoring biases in the publication data by using distinct researcher cohorts defined over non-overlapping time periods. For each researcher cohort we calculated Gini inequality coefficients, with average Gini values around 0.48 for total publications and 0.73 for total citations. For perspective, these observed values are well in excess of the inequality levels observed for personal income in developing countries. Investigating possible sources of this inequality, we identify two potential mechanisms that act at the level of the individual that may play defining roles in the emergence of the broad productivity and impact distributions found in science. First, we show that the average time interval between a researcherâs successive publications in top journals decreases with each subsequent publication. Second, after controlling for the time dependent features of citation distributions, we compare the citation impact of subsequent publications within a researcherâs publication record. We find that as researchers continue to publish in top journals, there is more likely to be a decreasing trend in the relative citation impact with each subsequent publication. This pattern highlights the difficulty of repeatedly producing research findings in the highest citation-impact echelon, as well as the role played by finite career and knowledge life-cycles, and the intriguing possibility that confirmation bias plays a role in the evaluation of scientific careers
Commentary: The case for caution in predicting scientistsâ future impact
We stress-test the career predictability model proposed by Acuna et al.
[Nature 489, 201-202 2012] by applying their model to a longitudinal career
data set of 100 Assistant professors in physics, two from each of the top 50
physics departments in the US. The Acuna model claims to predict h(t+\Delta t),
a scientist's h-index \Delta t years into the future, using a linear
combination of 5 cumulative career measures taken at career age t. Here we
investigate how the "predictability" depends on the aggregation of career data
across multiple age cohorts. We confirm that the Acuna model does a respectable
job of predicting h(t+\Delta t) up to roughly 6 years into the future when
aggregating all age cohorts together. However, when calculated using subsets of
specific age cohorts (e.g. using data for only t=3), we find that the model's
predictive power significantly decreases, especially when applied to early
career years. For young careers, the model does a much worse job of predicting
future impact, and hence, exposes a serious limitation. The limitation is
particularly concerning as early career decisions make up a significant
portion, if not the majority, of cases where quantitative approaches are likely
to be applied.Comment: 2 pages, 1 figur
The evolution of networks of innovators within and across borders: Evidence from patent data
Recent studies on the geography of knowledge networks have documented a negative impact of physical distance and institutional borders upon research and development (R&D) collaborations. Though it is widely recognized that geographic constraints and national borders impede the diffusion of knowledge, less attention has been devoted to the temporal evolution of these constraints. In this study we use data on patents filed with the European Patent Office (EPO) for OECD countries to analyze the impact of physical distance and country borders on inter-regional links in four different networks over the period 1988-2009: (1) co-inventorship, (2) patent citations, (3) inventor mobility and (4) the location of R&D laboratories. We find the constraint imposed by country borders and distance decreased until mid-1990s then started to grow, particularly for distance. We further investigate the role of large innovation "hubs" as attractors of new collaboration opportunities and the impact of region size and locality on the evolution of cross-border patenting activities. The intensity of European cross-country
inventor collaborations increased at a higher pace than their non-European counterparts until 2004,
with no significant relative progress thereafter. Moreover, when analyzing networks of geographical mobility, multinational R&D activities and patent citations we cannot detect any substantial progress in European research integration above and beyond the common global trend
Networks of innovators within and across borders. Evidence from patent data
Recent studies on the geography of knowledge networks have documented a negative impact of physical distance and institutional borders upon research and development (R&D) collaborations. Though it is widely recognized that geographic constraints hamper the diffusion of knowledge,
less attention has been devoted to the temporal evolution of these constraints. In this study we use data on patents filed with the European Patent Office (EPO) for 50 countries to analyze the impact of physical distance and country borders on inter-regional links in four different networks over the period 1988-2009: (1) co-inventorship, (2) patent citations, (3) inventor mobility and (4)
the location of R&D laboratories. We find the constraint imposed by country borders and distance decreased until mid-1990s then started to grow, particularly for distance. The intensity of European cross-country inventor collaborations increased at a higher pace than their non-European counterparts until 2004, with no significant relative progress afterwards. Moreover, when analyzing
networks of geographical mobility, multinational R&D activities and patent citations we do not depict any substantial progress in European research integration aside from the influence of common global trends
Exploiting citation networks for large-scale author name disambiguation
We present a novel algorithm and validation method for disambiguating author
names in very large bibliographic data sets and apply it to the full Web of
Science (WoS) citation index. Our algorithm relies only upon the author and
citation graphs available for the whole period covered by the WoS. A pair-wise
publication similarity metric, which is based on common co-authors,
self-citations, shared references and citations, is established to perform a
two-step agglomerative clustering that first connects individual papers and
then merges similar clusters. This parameterized model is optimized using an
h-index based recall measure, favoring the correct assignment of well-cited
publications, and a name-initials-based precision using WoS metadata and
cross-referenced Google Scholar profiles. Despite the use of limited metadata,
we reach a recall of 87% and a precision of 88% with a preference for
researchers with high h-index values. 47 million articles of WoS can be
disambiguated on a single machine in less than a day. We develop an h-index
distribution model, confirming that the prediction is in excellent agreement
with the empirical data, and yielding insight into the utility of the h-index
in real academic ranking scenarios.Comment: 14 pages, 5 figure
Reputation and Impact in Academic Careers
Reputation is an important social construct in science, which enables
informed quality assessments of both publications and careers of scientists in
the absence of complete systemic information. However, the relation between
reputation and career growth of an individual remains poorly understood,
despite recent proliferation of quantitative research evaluation methods. Here
we develop an original framework for measuring how a publication's citation
rate depends on the reputation of its central author , in
addition to its net citation count . To estimate the strength of the
reputation effect, we perform a longitudinal analysis on the careers of 450
highly-cited scientists, using the total citations of each scientist as
his/her reputation measure. We find a citation crossover which
distinguishes the strength of the reputation effect. For publications with , the author's reputation is found to dominate the annual citation
rate. Hence, a new publication may gain a significant early advantage
corresponding to roughly a 66% increase in the citation rate for each tenfold
increase in . However, the reputation effect becomes negligible for
highly cited publications meaning that for the citation rate
measures scientific impact more transparently. In addition we have developed a
stochastic reputation model, which is found to reproduce numerous statistical
observations for real careers, thus providing insight into the microscopic
mechanisms underlying cumulative advantage in science.Comment: Final published version of the main manuscript including additional
analysis: 9 pages, 4 figures, 1 table, and full reference list, including
those in the Supplementary Information. For the SI Appendix, see
http://physics.bu.edu/~amp17/webpage_files/MyPapers/Reputation_SI.pd
Node similarity within subgraphs of protein interaction networks
We propose a biologically motivated quantity, twinness, to evaluate local
similarity between nodes in a network. The twinness of a pair of nodes is the
number of connected, labeled subgraphs of size n in which the two nodes possess
identical neighbours. The graph animal algorithm is used to estimate twinness
for each pair of nodes (for subgraph sizes n=4 to n=12) in four different
protein interaction networks (PINs). These include an Escherichia coli PIN and
three Saccharomyces cerevisiae PINs -- each obtained using state-of-the-art
high throughput methods. In almost all cases, the average twinness of node
pairs is vastly higher than expected from a null model obtained by switching
links. For all n, we observe a difference in the ratio of type A twins (which
are unlinked pairs) to type B twins (which are linked pairs) distinguishing the
prokaryote E. coli from the eukaryote S. cerevisiae. Interaction similarity is
expected due to gene duplication, and whole genome duplication paralogues in S.
cerevisiae have been reported to co-cluster into the same complexes. Indeed, we
find that these paralogous proteins are over-represented as twins compared to
pairs chosen at random. These results indicate that twinness can detect
ancestral relationships from currently available PIN data.Comment: 10 pages, 5 figures. Edited for typos, clarity, figures improved for
readabilit