    Runaway Events Dominate the Heavy Tail of Citation Distributions

    Statistical distributions with heavy tails are ubiquitous in natural and social phenomena. Since the entries in heavy tail have disproportional significance, the knowledge of its exact shape is very important. Citations of scientific papers form one of the best-known heavy tail distributions. Even in this case there is a considerable debate whether citation distribution follows the log-normal or power-law fit. The goal of our study is to solve this debate by measuring citation distribution for a very large and homogeneous data. We measured citation distribution for 418,438 Physics papers published in 1980-1989 and cited by 2008. While the log-normal fit deviates too strong from the data, the discrete power-law function with the exponent γ=3.15\gamma=3.15 does better and fits 99.955% of the data. However, the extreme tail of the distribution deviates upward even from the power-law fit and exhibits a dramatic "runaway" behavior. The onset of the runaway regime is revealed macroscopically as the paper garners 1000-1500 citations, however the microscopic measurements of autocorrelation in citation rates are able to predict this behavior in advance.Comment: 6 pages, 5 Figure

    Application of author bibliographic coupling analysis and author keywords ranking in identifying research fronts of Indian Neurosciences research

    Probing research fronts identification unfailingly delivers interesting results in any field due to its decisive nature. Citation analysis is an acclaimed method used in this process among which more successful results backing Author Co-citation Analysis (ACA) and Author Bibliographic Coupling Analysis (ABCA). The current study opted to combine author bibliographic coupling network analysis and author keywords to explore and display a graphical representation of prominent research areas’ evolution over the study period in Indian Neuroscience research domain. Application of hierarchical clustering to author bibliographic coupling networks for all non-overlapping consecutive years included in the study period were performed and analysed in VOSviewer mapping software. The powerful Lin/log modularity normalization was chosen for determining distance based similarity while clustering the network units. Results of the study unfolded ten prominent research subfields with more emphasis on Epilepsy’ and ‘Parkinson’s disease’ research. Depression was identified as one of the upcoming prominent area in recent years. Apart from its cruciality in framing national level mental health policies, the study will also prove ABCA to be an effective method in identifying prominent research areas

    Peer assessment or promotion by numbers? A comparative study of different measures of researcher performance within the UK Library and Information Science research community

    Hirsch’s h-index, Egghe’s g-index, total citation and publication counts, and five proposed new metrics were correlated with one another using Spearman’s Rank Correlation for one hundred randomly selected academics and researchers working in UK Library and Information Science departments. Metrics were compared for individuals of different genders and at institutions awarded different RAE (2001) grades. Individuals’ metrics were rank-correlated against academic ranks and RAE (2001) grades of their employing departments. Metrics calculated using Web of Science and Google Scholar data were compared. Peer- and h-index metric-ranked orders of researchers were rank-correlated. Citation behaviour and attitudes towards peer and citation-based assessment of 263 academics and researchers were investigated by factor analysis of online attitudinal survey responses. h increased curvilinearly with total citation and publication counts, suggesting that h was constrained by the activity in the field preventing individuals producing enough heavily cited publications to increase their h-index scores. Most individuals therefore shared similar h-index scores, making interpersonal comparisons difficult. Total citation counts and Bihui’s a-index scores distinguished between more individuals, though whether they could confidently identify differences between individuals is uncertain. Both databases arbitrarily omitted individuals and publications, systematically biasing citation metrics calculated using them. In contrast to studies of larger fields, no citation metrics correlated with RAE grade, academic rank, or direct peer-assessment, suggesting that citation-based assessment is unsuitable for research fields with relatively little research activity. No gender bias was evident in academic rank, esteem or citedness. At least nine independent factors influence citation behaviour. Mertonian factors dominated. The independence of the factors suggested different individuals have different combinations of non-Mertonian motivations. The overriding meaning of citations was confirmed as signals of relevance and reward. Recommendations for future research include a need to develop simple, robust methods to identify subfields and normalise citations across subfields, to quantify the impact of random bias and to determine whether it varies across subfields, and to study the rate of accumulation of citations and citation distribution changes for individuals (and departments) over time to determine whether career age can be controlled for, in particular

    Analyzing and Modeling Real-World Phenomena with Complex Networks: A Survey of Applications

    The success of new scientific areas can be assessed by their potential for contributing to new theoretical approaches and in applications to real-world problems. Complex networks have fared extremely well in both of these aspects, with their sound theoretical basis developed over the years and with a variety of applications. In this survey, we analyze the applications of complex networks to real-world problems and data, with emphasis in representation, analysis and modeling, after an introduction to the main concepts and models. A diversity of phenomena are surveyed, which may be classified into no less than 22 areas, providing a clear indication of the impact of the field of complex networks.Comment: 103 pages, 3 figures and 7 tables. A working manuscript, suggestions are welcome

    Clustering of scientific fields by integrating text mining and bibliometrics.

    De toenemende verspreiding van wetenschappelijke en technologische publicaties via het internet, en de beschikbaarheid ervan in grootschalige bibliografische databanken, leiden tot enorme mogelijkheden om de wetenschap en technologie in kaart te brengen. Ook de voortdurende toename van beschikbare rekenkracht en de ontwikkeling van nieuwe algoritmen dragen hiertoe bij. Belangrijke uitdagingen blijven echter bestaan. Dit proefschrift bevestigt de hypothese dat de nauwkeurigheid van zowel het clusteren van wetenschappelijke kennisgebieden als het classificeren van publicaties nog verbeterd kunnen worden door het integreren van tekstontginning en bibliometrie. Zowel de tekstuele als de bibliometrische benadering hebben voor- en nadelen, en allebei bieden ze een andere kijk op een corpus van wetenschappelijke publicaties of patenten. Enerzijds is er een schat aan tekstinformatie aanwezig in dergelijke documenten, anderzijds vormen de onderlinge citaties grote netwerken die extra informatie leveren. We integreren beide gezichtspunten en tonen hoe bestaande tekstuele en bibliometrische methoden kunnen verbeterd worden. De dissertatie is opgebouwd uit drie delen: Ten eerste bespreken we het gebruik van tekstontginningstechnieken voor informatievergaring en voor het in kaart brengen van kennis vervat in teksten. We introduceren en demonstreren het raamwerk voor tekstontginning, evenals het gebruik van agglomeratieve hiërarchische clustering. Voorts onderzoeken we de relatie tussen enerzijds de performantie van het clusteren en anderzijds het gewenste aantal clusters en het aantal factoren bij latent semantische indexering. Daarnaast beschrijven we een samengestelde, semi-automatische strategie om het aantal clusters in een verzameling documenten te bepalen. Ten tweede behandelen we netwerken die bestaan uit citaties tussen wetenschappelijke documenten en netwerken die ontstaan uit onderlinge samenwerkingsverbanden tussen auteurs. Dergelijke netwerken kunnen geanalyseerd worden met technieken van de bibliometrie en de grafentheorie, met als doel het rangschikken van relevante entiteiten, het clusteren en het ontdekken van gemeenschappen. Ten derde tonen we de complementariteit aan van tekstontginning en bibliometrie en stellen we mogelijkheden voor om beide werelden op correcte wijze te integreren. De performantie van ongesuperviseerd clusteren en van classificeren verbetert significant door het samenvoegen van de tekstuele inhoud van wetenschappelijke publicaties en de structuur van citatienetwerken. Een methode gebaseerd op statistische meta-analyse behaalt de beste resultaten en overtreft methoden die enkel gebaseerd zijn op tekst of citaties. Onze geïntegreerde of hybride strategieën voor informatievergaring en clustering worden gedemonstreerd in twee domeinstudies. Het doel van de eerste studie is het ontrafelen en visualiseren van de conceptstructuur van de informatiewetenschappen en het toetsen van de toegevoegde waarde van de hybride methode. De tweede studie omvat de cognitieve structuur, bibliometrische eigenschappen en de dynamica van bio-informatica. We ontwikkelen een methode voor dynamisch en geïntegreerd clusteren van evoluerende bibliografische corpora. Deze methode vergelijkt en volgt clusters doorheen de tijd. Samengevat kunnen we stellen dat we voor de complementaire tekst- en netwerkwerelden een hybride clustermethode ontwerpen die tegelijkertijd rekening houdt met beide paradigma's. We tonen eveneens aan dat de geïntegreerde zienswijze een beter begrip oplevert van de structuur en de evolutie van wetenschappelijke kennisgebieden.SISTA;

    O impacto da biblioteca do conhecimento online (B-on) sobre a utilização e a produção científica portuguesas (2000-2010)

    Nos últimos anos, têm sido várias as iniciativas realizadas para promover o acesso universal à Sociedade da Informação e do Conhecimento. Foi neste contexto que, em 2004, a Biblioteca do Conhecimento Online (b-on) foi lançada em Portugal celebrando este ano o seu 10 º aniversário. Com a b-on, tornou-se mais fácil ter acesso ao texto integral de publicações científicas internacionais. Este estudo tem como objetivo apresentar e analisar alguns dos indicadores estatísticos e bibliométricos da produção científica portuguesa relacionando- -os com a b-on. Procurámos conhecer o impacto da b-on quer ao nível da utilização quer ao nível da produção científica. Para tal analisámos o uso dos recursos eletrónicos disponibilizados pela b-on por parte das universidades públicas membros do consórcio, entre 2004 e 2010, tendo escolhido como amostra as cinco universidades com maior número de downloads por FTE (full time equivalent). Analisámos a evolução do número de downloads, os fornecedores de conteúdos mais utilizados, os títulos com maior utilização. Para além dos dados de utilização do consórcio, utilizámos a Web of Science (WoS) a partir da qual identificámos os artigos indexados com afiliação Portuguesa e nas cinco universidades que constituem a nossa amostra entre 2000-2010. Posteriormente, foram identificados os autores com maior número de artigos indexados a quem aplicámos um inquérito por questionário eletrónico sobre o impacto que a b-on tem nas suas práticas de investigação e cujos resultados mostram a relação entre o consumo e a produção científica. Assim, através de uma metodologia quantitativa e bibliométrica, foram identificadas as áreas de pesquisa com o maior número de artigos, revistas científicas com o maior número de artigos publicados, idioma, co-autoria internacional, entre outros. Para além deste levantamento quantitativo, entrevistámos alguns dos principais intervenientes pelo e aquando do aparecimento da b-on ao nível político, operacional e colaborativo. Esta triangulação de métodos permitiu-nos obter uma maior riqueza de dados e, como tal, fazer uma análise mais completa sobre a b-on e o seu impacto junto da comunidade académica e científica nacional. A evolução dos totais globais dos downloads nas universidades estudadas apresentou uma tendência crescente e constante da utilização dos conteúdos eletrónicos. Existe claramente um aumento do consumo dos mesmos por parte da comunidade académica e científica portuguesa, em particular por parte das universidades. Também a produção científica nacional tem crescido nos últimos anos, pelo que se pode concluir que a disponibilidade e o acesso aos recursos eletrónicos contribuem para o aumento da produtividade científica das universidades e que o estudo e a análise do seu uso e produção são essenciais. A b-on é hoje um caso de sucesso e considerada por muitos como um instrumento fundamental no acesso e na produção de conteúdos científicos; The impact of The Online Knowledge Library (b-on) on the usage and Portuguese scientific output (2000-2010) Abstract: In recent years several initiatives have taken place to promote universal access to the Information and Knowledge Society. It was in this context that in 2004 the Online Knowledge Library (b-on) was launched in Portugal celebrating this year its 10th anniversary. With b-on it became easier to get access to full text international scientific publications. This study aims to present and analyse some statistical and bibliometric indicators of the Portuguese scientific output seeking to evaluate its connection with b-on. Our aim was to understand the impact of b-on in terms of usage and scientific output. We analysed the usage of b-on resources by the public universities members of the consortium from 2004 to 2010, and we chose as sample the five universities with more downloads per FTE (full time equivalent). We analyse the evolution of downloads, widely used content suppliers, the favourite titles within total downloads. In addition to the usage data of the consortium, we used the Web of Science (WoS) from which we identified the articles indexed with affiliation in Portugal and in the five Portuguese universities, individually, between 2000 and 2010. Thus, and through a quantitative and bibliometric methodology, we identified the research areas with the largest number of articles, scientific journals with the highest number of published articles, language and international co-authorship, among others. Thereafter, we identified the authors with the largest number of indexed articles and made them a questionnaire about the impact that b-on has on their research patterns and whose results show the relation between consumption and scientific output. Beyond this quantitative analysis, we interviewed some of the key persons responsible for the development of b-on concerning the political, operational and collaborative levels. This triangulation of methods allowed us to obtain a higher data richness and do a more complete analysis of b-on’s impact on the academic and scientific national community. The overall totals for downloads at the studied universities showed a constant growth of the electronic contents usage. Therefore, there is clearly an upward trend in the consumption of scholarly information in electronic form in the Portuguese academic community, especially at the universities We conclude that the availability and access to electronic resources contributes to the increased of the scientific productivity of the universities and that the study and analysis of its use and output are essential. The b-on is a success and considered by many teachers and researchers as a fundamental tool in accessing and producing scientific contents