21 research outputs found

    Scienceography: the study of how science is written

    Full text link
    Scientific literature has itself been the subject of much scientific study, for a variety of reasons: understanding how results are communicated, how ideas spread, and assessing the influence of areas or individuals. However, most prior work has focused on extracting and analyzing citation and stylistic patterns. In this work, we introduce the notion of 'scienceography', which focuses on the writing of science. We provide a first large scale study using data derived from the arXiv e-print repository. Crucially, our data includes the "source code" of scientific papers-the LaTEX source-which enables us to study features not present in the "final product", such as the tools used and private comments between authors. Our study identifies broad patterns and trends in two example areas-computer science and mathematics-as well as highlighting key differences in the way that science is written in these fields. Finally, we outline future directions to extend the new topic of scienceography.Comment: 13 pages,16 figures. Sixth International Conference on FUN WITH ALGORITHMS, 201

    The Bayesian Echo Chamber: Modeling Social Influence via Linguistic Accommodation

    Full text link
    We present the Bayesian Echo Chamber, a new Bayesian generative model for social interaction data. By modeling the evolution of people's language usage over time, this model discovers latent influence relationships between them. Unlike previous work on inferring influence, which has primarily focused on simple temporal dynamics evidenced via turn-taking behavior, our model captures more nuanced influence relationships, evidenced via linguistic accommodation patterns in interaction content. The model, which is based on a discrete analog of the multivariate Hawkes process, permits a fully Bayesian inference algorithm. We validate our model's ability to discover latent influence patterns using transcripts of arguments heard by the US Supreme Court and the movie "12 Angry Men." We showcase our model's capabilities by using it to infer latent influence patterns from Federal Open Market Committee meeting transcripts, demonstrating state-of-the-art performance at uncovering social dynamics in group discussions.Comment: 14 pages, 7 figures, to appear in AISTATS 2015. Fixed minor formatting issue

    Measuring national capability over big science's multidisciplinarity: A case study of nuclear fusion research

    Get PDF
    In the era of big science, countries allocate big research and development budgets to large scientific facilities that boost collaboration and research capability. A nuclear fusion device called the "tokamak" is a source of great interest for many countries because it ideally generates sustainable energy expected to solve the energy crisis in the future. Here, to explore the scientific effects of tokamaks, we map a country's research capability in nuclear fusion research with normalized revealed comparative advantage on five topical clusters-material, plasma, device, diagnostics, and simulation-detected through a dynamic topic model. Our approach captures not only the growth of China, India, and the Republic of Korea but also the decline of Canada, Japan, Sweden, and the Netherlands. Time points of their rise and fall are related to tokamak operation, highlighting the importance of large facilities in big science. The gravity model points out that two countries collaborate less in device, diagnostics, and plasma research if they have comparative advantages in different topics. This relation is a unique feature of nuclear fusion compared to other science fields. Our results can be used and extended when building national policies for big science.11Yscopu

    Understanding Topic Models in Context: A Mixed-Methods Approach to the Meaningful Analysis of Large Document Collections

    Get PDF
    In recent years, we have witnessed an unprecedented proliferation of large document collections. This development has spawned the need for appropriate analytical means. In particular, to seize the thematic composition of large document collections, researchers increasingly draw on quantitative topic models. Among their most prominent representatives is the Latent Dirichlet Allocation (LDA). Yet, these models have significant drawbacks, e.g. the generated topics lack context and thus meaningfulness. Prior research has rarely addressed this limitation through the lens of mixed-methods research. We position our paper towards this gap by proposing a structured mixed-methods approach to the meaningful analysis of large document collections. Particularly, we draw on qualitative coding and quantitative hierarchical clustering to validate and enhance topic models through re-contextualization. To illustrate the proposed approach, we conduct a case study of the thematic composition of the AIS Senior Scholars' Basket of Journals

    Citation recommendation via proximity full-text citation analysis and supervised topical prior

    Get PDF
    Currently the many publications are now available electronically and online, which has had a significant effect, while brought several challenges. With the objective to enhance citation recommendation based on innovative text and graph mining algorithms along with full-text citation analysis, we utilized proximity-based citation contexts extracted from a large number of full-text publications, and then used a publication/citation topic distribution to generate a novel citation graph to calculate the publication topical importance. The importance score can be utilized as a new means to enhance the recommendation performance. Experiment with full-text citation data showed that the novel method could significantly (p < 0.001) enhance citation recommendation performance

    Exploring social representations of adapting to climate change using topic modeling and Bayesian networks

    Get PDF
    When something unfamiliar emerges or when something familiar does something unexpected people need to make sense of what is emerging or going on in order to act. Social representations theory suggests how individuals and society make sense of the unfamiliar and hence how the resultant social representations (SRs) cognitively, emotionally, and actively orient people and enable communication. SRs are social constructions that emerge through individual and collective engagement with media and with everyday conversations among people. Recent developments in text analysis techniques, and in particular topic modeling, provide a potentially powerful analytical method to examine the structure and content of SRs using large samples of narrative or text. In this paper I describe the methods and results of applying topic modeling to 660 micronarratives collected from Australian academics/researchers, government employees, and members of the public in 2010-2011. The narrative fragments focused on adaptation to climate change (CC) and hence provide an example of Australian society making sense of an emerging and conflict ridden phenomena. The results of the topic modeling reflect elements of SRs of adaptation to CC that are consistent with findings in the literature as well as being reasonably robust predictors of classes of action in response to CC. Bayesian Network (BN) modeling was used to identify relationships among the topics (SR elements) and in particular to identify relationships among topics, sentiment, and action. Finally the resulting model and topic modeling results are used to highlight differences in the salience of SR elements among social groups. The approach of linking topic modeling and BN modeling offers a new and encouraging approach to analysis for ongoing research on SRs
    corecore