150 research outputs found

    Toward predicting research proposal success

    Full text link
    © 2017, Akadémiai Kiadó, Budapest, Hungary. Citation analysis and discourse analysis of 369 R01 NIH proposals are used to discover possible predictors of proposal success. We focused on two issues: the Matthew effect in science—Merton’s claim that eminent scientists have an inherent advantage in the competition for funds—and quality of writing or clarity. Our results suggest that a clearly articulated proposal is more likely to be funded than a proposal with lower quality of discourse. We also find that proposal success is correlated with a high level of topical overlap between the proposal references and the applicant’s prior publications. Implications associated with the analysis of proposal data are discussed.https://deepblue.lib.umich.edu/bitstream/2027.42/150071/2/Predicting_Proposal_Success_rev0_hdr.pdfPublished versionDescription of Predicting_Proposal_Success_rev0_hdr.pdf : Accepted versio

    Consistency and trends of technological innovations: a network approach to the international patent classification data

    Get PDF
    Classifying patents by the technology areas they pertain is important to enable information search and facilitate policy analysis and socio-economic studies. Based on the OECD Triadic Patent Family database, this study constructs a cohort network based on the grouping of IPC subclasses in the same patent families, and a citation network based on citations between subclasses of patent families citing each other. This paper presents a systematic analysis approach which obtains naturally formed network clusters identified using a Lumped Markov Chain method, extracts community keys traceable over time, and investigates two important community characteristics: consistency and changing trends. The results are verified against several other methods, including a recent research measuring patent text similarity. The proposed method contributes to the literature a network-based approach to study the endogenous community properties of an exogenously devised classification system. The application of this method may improve accuracy and efficiency of the IPC search platform and help detect the emergence of new technologies

    Knowledge Integration and Diffusion: Measures and Mapping of Diversity and Coherence

    Full text link
    I present a framework based on the concepts of diversity and coherence for the analysis of knowledge integration and diffusion. Visualisations that help understand insights gained are also introduced. The key novelty offered by this framework compared to previous approaches is the inclusion of cognitive distance (or proximity) between the categories that characterise the body of knowledge under study. I briefly discuss the different methods to map the cognitive dimension

    Determinants of the impact factor of publications: A panel model for journals indexed in scopus 2017

    Get PDF
    This article has the purpose of establishing which are the variables that allow explaining the behavior of the SJR between 2014 and 2016, for the journals indexed in Scopus. To do this, journals that had a SJR value greater than eight in 2016 were selected, that is, 103 of the 22,231. For the analysis, a model of standard errors corrected for panel was used, for which a coefficient of determination of 81% was obtained, and a model of feasible generalized least squares. From these it was possible to establish that variables such as open access, the number of areas in which the publication is registered and the language of publication, are not significant to explain the impact of a publication. On the contrary, variables such as belonging to health sciences or social sciences

    Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

    Get PDF
    We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE.PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts

    Citescore of publications indexed in Scopus: an implementation of panel data

    Get PDF
    This article is intended to establish the variables that explain the behavior of the CiteScore metrics from 2014 to 2016, for journals indexed in Scopus in 2017. With this purpose, journals with a CiteScore value greater than 11 were selected in any of the periods, that is to say, 133 journals. For the data analysis, a model of standard corrected errors for panel was used, from which a coefficient of determination of 77% was obtained. From the results, it was possible to state that journals of arts and humanities; business; administration and accounting; economics, econometrics, and finance; immunology and microbiology; medicine and social sciences, have the greatest impact.Corporación Universitaria Minuto de Dios, Fundación Universitaria Konrad Lorenz, Universidad de La Habana, Universidad de la Costa

    The Impact of Boundary Spanning Scholarly Publications and Patents

    Get PDF
    Human knowledge and innovation are recorded in two media: scholarly publication and patents. These records not only document a new scientific insight or new method developed, but they also carefully cite prior work upon which the innovation is built.We quantify the impact of information flow across fields using two large citation dataset: one spanning over a century of scholarly work in the natural sciences, social sciences and humanities, and second spanning a quarter century of United States patents.We find that a publication's citing across disciplines is tied to its subsequent impact. In the case of patents and natural science publications, those that are cited at least once are cited slightly more when they draw on research outside of their area. In contrast, in the social sciences, citing within one's own field tends to be positively correlated with impact

    An Analysis of the Abstracts Presented at the Annual Meetings of the Society for Neuroscience from 2001 to 2006

    Get PDF
    Annual meeting abstracts published by scientific societies often contain rich arrays of information that can be computationally mined and distilled to elucidate the state and dynamics of the subject field. We extracted and processed abstract data from the Society for Neuroscience (SFN) annual meeting abstracts during the period 2001–2006 in order to gain an objective view of contemporary neuroscience. An important first step in the process was the application of data cleaning and disambiguation methods to construct a unified database, since the data were too noisy to be of full utility in the raw form initially available. Using natural language processing, text mining, and other data analysis techniques, we then examined the demographics and structure of the scientific collaboration network, the dynamics of the field over time, major research trends, and the structure of the sources of research funding. Some interesting findings include a high geographical concentration of neuroscience research in the north eastern United States, a surprisingly large transient population (66% of the authors appear in only one out of the six studied years), the central role played by the study of neurodegenerative disorders in the neuroscience community, and an apparent growth of behavioral/systems neuroscience with a corresponding shrinkage of cellular/molecular neuroscience over the six year period. The results from this work will prove useful for scientists, policy makers, and funding agencies seeking to gain a complete and unbiased picture of the community structure and body of knowledge encapsulated by a specific scientific domain
    corecore