12 research outputs found

    Clustering citation histories in the Physical Review

    Get PDF
    We investigate publications trough their citation histories -- the history events are the citations given to the article by younger publications and the time of the event is the date of publication of the citing article. We propose a methodology, based on spectral clustering, to group citation histories, and the corresponding publications, into communities and apply multinomial logistic regression to provide the revealed communities with semantics in terms of publication features. We study the case of publications from the full Physical Review archive, covering 120 years of physics in all its domains. We discover two clear archetypes of publications -- marathoners and sprinters -- that deviate from the average middle-of-the-roads behaviour, and discuss some publication features, like age of references and type of publication, that are correlated with the membership of a publication into a certain community

    Unraveling the dynamics of growth, aging and inflation for citations to scientific articles from specific research fields

    Full text link
    We analyze the time evolution of citations acquired by articles from journals of the American Physical Society (PRA, PRB, PRC, PRD, PRE and PRL). The observed change over time in the number of papers published in each journal is considered an exogenously caused variation in citability that is accounted for by a normalization. The appropriately inflation-adjusted citation rates are found to be separable into a preferential-attachment-type growth kernel and a purely obsolescence-related (i.e., monotonously decreasing as a function of time since publication) aging function. Variations in the empirically extracted parameters of the growth kernels and aging functions associated with different journals point to research-field-specific characteristics of citation intensity and knowledge flow. Comparison with analogous results for the citation dynamics of technology-disaggregated cohorts of patents provides deeper insight into the basic principles of information propagation as indicated by citing behavior.Comment: 13 pages, 6 figures, Elsevier style, v2: revised version to appear in J. Informetric

    Search for Evergreens in Science: A Functional Data Analysis

    Full text link
    Evergreens in science are papers that display a continual rise in annual citations without decline, at least within a sufficiently long time period. Aiming to better understand evergreens in particular and patterns of citation trajectory in general, this paper develops a functional data analysis method to cluster citation trajectories of a sample of 1699 research papers published in 1980 in the American Physical Society (APS) journals. We propose a functional Poisson regression model for individual papers' citation trajectories, and fit the model to the observed 30-year citations of individual papers by functional principal component analysis and maximum likelihood estimation. Based on the estimated paper-specific coefficients, we apply the K-means clustering algorithm to cluster papers into different groups, for uncovering general types of citation trajectories. The result demonstrates the existence of an evergreen cluster of papers that do not exhibit any decline in annual citations over 30 years.Comment: 40 pages, 9 figure

    A multiple k-means cluster ensemble framework for clustering citation trajectories

    Full text link
    Citation maturity time varies for different articles. However, the impact of all articles is measured in a fixed window. Clustering their citation trajectories helps understand the knowledge diffusion process and reveals that not all articles gain immediate success after publication. Moreover, clustering trajectories is necessary for paper impact recommendation algorithms. It is a challenging problem because citation time series exhibit significant variability due to non linear and non stationary characteristics. Prior works propose a set of arbitrary thresholds and a fixed rule based approach. All methods are primarily parameter dependent. Consequently, it leads to inconsistencies while defining similar trajectories and ambiguities regarding their specific number. Most studies only capture extreme trajectories. Thus, a generalised clustering framework is required. This paper proposes a feature based multiple k means cluster ensemble framework. 1,95,783 and 41,732 well cited articles from the Microsoft Academic Graph data are considered for clustering short term (10 year) and long term (30 year) trajectories, respectively. It has linear run time. Four distinct trajectories are obtained Early Rise Rapid Decline (2.2%), Early Rise Slow Decline (45%), Delayed Rise No Decline (53%), and Delayed Rise Slow Decline (0.8%). Individual trajectory differences for two different spans are studied. Most papers exhibit Early Rise Slow Decline and Delayed Rise No Decline patterns. The growth and decay times, cumulative citation distribution, and peak characteristics of individual trajectories are redefined empirically. A detailed comparative study reveals our proposed methodology can detect all distinct trajectory classes.Comment: 29 page

    Does deep learning help topic extraction? A kernel k-means clustering method with word embedding

    Full text link
    © 2018 All rights reserved. Topic extraction presents challenges for the bibliometric community, and its performance still depends on human intervention and its practical areas. This paper proposes a novel kernel k-means clustering method incorporated with a word embedding model to create a solution that effectively extracts topics from bibliometric data. The experimental results of a comparison of this method with four clustering baselines (i.e., k-means, fuzzy c-means, principal component analysis, and topic models) on two bibliometric datasets demonstrate its effectiveness across either a relatively broad range of disciplines or a given domain. An empirical study on bibliometric topic extraction from articles published by three top-tier bibliometric journals between 2000 and 2017, supported by expert knowledge-based evaluations, provides supplemental evidence of the method's ability on topic extraction. Additionally, this empirical analysis reveals insights into both overlapping and diverse research interests among the three journals that would benefit journal publishers, editorial boards, and research communities

    Hierarchical topic tree: A hybrid model comprising network analysis and density peak search

    Full text link
    Topic hierarchies can help researchers to develop a quick and concise understanding of the main themes and concepts in a field of interest. This is especially useful for newcomers to a field or those with a passing need for basic knowledge of a research landscape. Yet, despite a plethora of studies into hierarchical topic identification, there still lacks a model that is comprehensive enough or adaptive enough to extract the topics from a corpus, deal with the concepts shared by multiple topics, arrange the topics in a hierarchy, and give each topic an appropriate name. Hence, this paper presents a one-stop framework for generating fully-conceptualized hierarchical topic trees. First, we generate a co-occurrence network based on key terms extracted from a corpus of documents. Then a density peak search algorithm is developed and applied to identify the core topic terms, which are subsequently used as topic labels. An overlapping community allocation algorithm follows to detect topics and possible overlaps between them. Lastly, the density peak search and overlapping community allocation algorithms run recursively to structure the topics into a hierarchical tree. The feasibility, reliability, and extensibility of the proposed framework are demonstrated through a case study on the field of computer science

    Social media metrics for new research evaluation

    Get PDF
    This chapter approaches, both from a theoretical and practical perspective, the most important principles and conceptual frameworks that can be considered in the application of social media metrics for scientific evaluation. We propose conceptually valid uses for social media metrics in research evaluation. The chapter discusses frameworks and uses of these metrics as well as principles and recommendations for the consideration and application of current (and potentially new) metrics in research evaluation.Comment: Forthcoming in Glanzel, W., Moed, H.F., Schmoch U., Thelwall, M. (2018). Springer Handbook of Science and Technology Indicators. Springe

    Studying the accumulation velocity of altmetric data tracked by Altmetric.com

    Get PDF
    This paper investigates the data accumulation velocity of 12 Altmetric.com data sources. DOI created date recorded by Crossref and altmetric event posted date tracked by Altmetric.com are combined to reflect the altmetric data accumulation patterns over time and to compare the data accumulation velocity of various data sources through three proposed indicators, including Velocity Index, altmetric half-life, and altmetric time delay. Results show that altmetric data sources exhibit different data accumulation velocity. Some altmetric data sources have data accumulated very fast within the first few days after publication, such as Reddit, Twitter, News, Facebook, Google+, and Blogs. On the opposite spectrum, research outputs are at relatively slow pace in accruing data on some data sources, like Policy documents, Peer review, Q&A, Wikipedia, Video, and F1000Prime. Most altmetric data sources' velocity degree also changes by document types, subject fields, and research topics. The type Review is slower in receiving altmetric mentions than Article, while Editorial Material and Letter are typically faster. In general, most altmetric data sources show higher velocity values in the fields of Physical Sciences and Engineering and Life and Earth Sciences. Within each field, there also exist some research topics that attract social attention faster than others.Merit, Expertise and Measuremen

    Unraveling the capabilities that enable digital transformation: A data-driven methodology and the case of artificial intelligence

    Full text link
    Digital transformation (DT) is prevalent in businesses today. However, current studies to guide DT are mostly qualitative, resulting in a strong call for quantitative evidence of exactly what DT is and the capabilities needed to enable it successfully. With the aim of filling the gaps, this paper presents a novel bibliometric framework that unearths clues from scientific articles and patents. The framework incorporates the scientific evolutionary pathways and hierarchical topic tree to quantitatively identify the DT research topics’ evolutionary patterns and hierarchies at play in DT research. Our results include a comprehensive definition of DT from the perspective of bibliometrics and a systematic categorization of the capabilities required to enable DT, distilled from over 10,179 academic papers on DT. To further yield practical insights on technological capabilities, the paper also includes a case study of 9,454 patents focusing on one of the emerging technologies - artificial intelligence (AI). We summarized the outcomes with a four-level AI capabilities model. The paper ends with a discussion on its contributions: presenting a quantitative account of the DT research, introducing a process based understanding of DT, offering a list of major capabilities enabling DT, and drawing the attention of managers to be aware of capabilities needed when undertaking their DT journey

    Ranking in evolving complex networks

    Get PDF
    Complex networks have emerged as a simple yet powerful framework to represent and analyze a wide range of complex systems. The problem of ranking the nodes and the edges in complex networks is critical for a broad range of real-world problems because it affects how we access online information and products, how success and talent are evaluated in human activities, and how scarce resources are allocated by companies and policymakers, among others. This calls for a deep understanding of how existing ranking algorithms perform, and which are their possible biases that may impair their effectiveness. Many popular ranking algorithms (such as Google’s PageRank) are static in nature and, as a consequence, they exhibit important shortcomings when applied to real networks that rapidly evolve in time. At the same time, recent advances in the understanding and modeling of evolving networks have enabled the development of a wide and diverse range of ranking algorithms that take the temporal dimension into account. The aim of this review is to survey the existing ranking algorithms, both static and time-aware, and their applications to evolving networks. We emphasize both the impact of network evolution on well-established static algorithms and the benefits from including the temporal dimension for tasks such as prediction of network traffic, prediction of future links, and identification of significant nodes
    corecore