112 research outputs found

    Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

    Get PDF
    We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE.PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts

    Topic identification challenge

    Get PDF
    Merit, Expertise and Measuremen

    The Citation Field of Evolutionary Economics

    Get PDF
    Evolutionary economics has developed into an academic field of its own, institutionalized around, amongst others, the Journal of Evolutionary Economics (JEE). This paper analyzes the way and extent to which evolutionary economics has become an interdisciplinary journal, as its aim was: a journal that is indispensable in the exchange of expert knowledge on topics and using approaches that relate naturally with it. Analyzing citation data for the relevant academic field for the Journal of Evolutionary Economics, we use insights from scientometrics and social network analysis to find that, indeed, the JEE is a central player in this interdisciplinary field aiming mostly at understanding technological and regional dynamics. It does not, however, link firmly with the natural sciences (including biology) nor to management sciences, entrepreneurship, and organization studies. Another journal that could be perceived to have evolutionary acumen, the Journal of Economic Issues, does relate to heterodox economics journals and is relatively more involved in discussing issues of firm and industry organization. The JEE seems most keen to develop theoretical insights

    Science Models as Value-Added Services for Scholarly Information Systems

    Full text link
    The paper introduces scholarly Information Retrieval (IR) as a further dimension that should be considered in the science modeling debate. The IR use case is seen as a validation model of the adequacy of science models in representing and predicting structure and dynamics in science. Particular conceptualizations of scholarly activity and structures in science are used as value-added search services to improve retrieval quality: a co-word model depicting the cognitive structure of a field (used for query expansion), the Bradford law of information concentration, and a model of co-authorship networks (both used for re-ranking search results). An evaluation of the retrieval quality when science model driven services are used turned out that the models proposed actually provide beneficial effects to retrieval quality. From an IR perspective, the models studied are therefore verified as expressive conceptualizations of central phenomena in science. Thus, it could be shown that the IR perspective can significantly contribute to a better understanding of scholarly structures and activities.Comment: 26 pages, to appear in Scientometric

    An Analysis of the Abstracts Presented at the Annual Meetings of the Society for Neuroscience from 2001 to 2006

    Get PDF
    Annual meeting abstracts published by scientific societies often contain rich arrays of information that can be computationally mined and distilled to elucidate the state and dynamics of the subject field. We extracted and processed abstract data from the Society for Neuroscience (SFN) annual meeting abstracts during the period 2001–2006 in order to gain an objective view of contemporary neuroscience. An important first step in the process was the application of data cleaning and disambiguation methods to construct a unified database, since the data were too noisy to be of full utility in the raw form initially available. Using natural language processing, text mining, and other data analysis techniques, we then examined the demographics and structure of the scientific collaboration network, the dynamics of the field over time, major research trends, and the structure of the sources of research funding. Some interesting findings include a high geographical concentration of neuroscience research in the north eastern United States, a surprisingly large transient population (66% of the authors appear in only one out of the six studied years), the central role played by the study of neurodegenerative disorders in the neuroscience community, and an apparent growth of behavioral/systems neuroscience with a corresponding shrinkage of cellular/molecular neuroscience over the six year period. The results from this work will prove useful for scientists, policy makers, and funding agencies seeking to gain a complete and unbiased picture of the community structure and body of knowledge encapsulated by a specific scientific domain

    Communities and patterns of scientific collaboration in Business and Management

    Get PDF
    This is the author's accepted version of this article deposited at arXiv (arXiv:1006.1788v2 [physics.soc-ph]) and subsequently published in Scientometrics October 2011, Volume 89, Issue 1, pp 381-396. The final publication is available at link.springer.com http://link.springer.com/article/10.1007%2Fs11192-011-0439-1Author's note: 17 pages. To appear in special edition of Scientometrics. Abstract on arXiv meta-data a shorter version of abstract on actual paper (both in journal and arXiv full pape

    Community structure and patterns of scientific collaboration in Business and Management

    Get PDF
    This is the author's accepted version of this article deposited at arXiv (arXiv:1006.1788v2 [physics.soc-ph]) and subsequently published in Scientometrics October 2011, Volume 89, Issue 1, pp 381-396. The final publication is available at link.springer.com http://link.springer.com/article/10.1007%2Fs11192-011-0439-1Author's note: 17 pages. To appear in special edition of Scientometrics. Abstract on arXiv meta-data a shorter version of abstract on actual paper (both in journal and arXiv full pape
    • …
    corecore