201 research outputs found

    Recreating the Network of Early Modern Natural Philosophy:A Mono- and Multilingual Text Data Vectorization Method

    Get PDF
    How could one create a network representation of a book corpus spanning over two hundred years? In this paper, we present a method based on text data vectorization for a complex and multifaceted network representation of an early modern corpus of 239 natural philosophy textbooks published in Latin, French, and English. On the one hand, we use unsupervised methods (namely topic modeling, term frequency – inverse document frequency, and multilingual word embeddings) to represent the broader features of this corpus, such as the homogeneity in the style and linguistic usages, both among works written in the same language, and across multiple languages. On the other hand, we use the collocate analysis of specific keywords to explore how certain concepts were understood, reshaped, and disseminated in the corpus. We call this the ‘semantic dimension.’ Each of these two dimensions provides a different way of correlating the books via text data vectorization and representing them as a network. Since each of these dimensions is in itself complex and multifaceted, the network we construct for each of them is a multiplex one, made of several layer-graphs. Furthermore, provided that there is enough information available about the authors of the works included in our inventory, this research offers the grounds for further expanding the described network representation in such a way as to create a third multiplex, one that explores some of the social features of the authors in question

    Multiscale Snapshots: Visual Analysis of Temporal Summaries in Dynamic Graphs

    Full text link
    The overview-driven visual analysis of large-scale dynamic graphs poses a major challenge. We propose Multiscale Snapshots, a visual analytics approach to analyze temporal summaries of dynamic graphs at multiple temporal scales. First, we recursively generate temporal summaries to abstract overlapping sequences of graphs into compact snapshots. Second, we apply graph embeddings to the snapshots to learn low-dimensional representations of each sequence of graphs to speed up specific analytical tasks (e.g., similarity search). Third, we visualize the evolving data from a coarse to fine-granular snapshots to semi-automatically analyze temporal states, trends, and outliers. The approach enables to discover similar temporal summaries (e.g., recurring states), reduces the temporal data to speed up automatic analysis, and to explore both structural and temporal properties of a dynamic graph. We demonstrate the usefulness of our approach by a quantitative evaluation and the application to a real-world dataset.Comment: IEEE Transactions on Visualization and Computer Graphics (TVCG), to appea

    Computational approaches to semantic change (Volume 6)

    Get PDF
    Semantic change — how the meanings of words change over time — has preoccupied scholars since well before modern linguistics emerged in the late 19th and early 20th century, ushering in a new methodological turn in the study of language change. Compared to changes in sound and grammar, semantic change is the least understood. Ever since, the study of semantic change has progressed steadily, accumulating a vast store of knowledge for over a century, encompassing many languages and language families. Historical linguists also early on realized the potential of computers as research tools, with papers at the very first international conferences in computational linguistics in the 1960s. Such computational studies still tended to be small-scale, method-oriented, and qualitative. However, recent years have witnessed a sea-change in this regard. Big-data empirical quantitative investigations are now coming to the forefront, enabled by enormous advances in storage capability and processing power. Diachronic corpora have grown beyond imagination, defying exploration by traditional manual qualitative methods, and language technology has become increasingly data-driven and semantics-oriented. These developments present a golden opportunity for the empirical study of semantic change over both long and short time spans

    On the Evolution of Knowledge Graphs: A Survey and Perspective

    Full text link
    Knowledge graphs (KGs) are structured representations of diversified knowledge. They are widely used in various intelligent applications. In this article, we provide a comprehensive survey on the evolution of various types of knowledge graphs (i.e., static KGs, dynamic KGs, temporal KGs, and event KGs) and techniques for knowledge extraction and reasoning. Furthermore, we introduce the practical applications of different types of KGs, including a case study in financial analysis. Finally, we propose our perspective on the future directions of knowledge engineering, including the potential of combining the power of knowledge graphs and large language models (LLMs), and the evolution of knowledge extraction, reasoning, and representation

    MODELING THE LEADERSHIP OF LANGUAGE CHANGE FROM DIACHRONIC TEXT

    Get PDF
    Natural languages constantly change over time. These changes are modulated by social factors such as influence which are not always directly observable. However, large-scale computational modeling of language change using timestamped text can uncover the latent organization and social structure. In turn, the social dynamics of language change can potentially illuminate our understanding of innovation, influence, and identity: Who leads? Who follows? Who diverges? This thesis contributes to the growing body of research on using computational methods to model language change with a focus on quantifying linguistic leadership of change. A series of studies highlight the unique contributions of this thesis: methods that scale to huge volumes of data; measures that quantify leadership at the level of individuals or in aggregate; and analysis that links linguistic leadership to other forms of influence. First, temporal and predictive models of event cascades on a network of millions of Twitter users are used to show that lexical change spreads in the form of a contagion and influence from densely embedded ties is crucial for the adoption of non-standard terms. A Granger-causal test for detecting social influence in event cascades on a network is then presented, which is robust to both the presence of confounds such as homophily and can be applied to model both linguistic or non-linguistic change in a network. Next, a novel scheme to score and identify documents that lead semantic change in progress is introduced. This linguistic measure of influence on the documents is strongly predictive of their influence in terms of the number of citations that they receive for both US court opinions and scientific articles. Subsequently, a measure of lead on any semantic change between a pair of document sources (e.g. newspapers) and a method to aggregate multiple lead-lag relationships into a network is presented. Analysis on an induced network of nineteenth century abolitionist newspapers, following the proposed method, reveals the important yet understated role of women and Black editors in shaping the discourse on abolitionism. Finally, a method to induce an aggregate semantic leadership network using contextual word representations is proposed to investigate the link between semantic leadership and influence in the form of citations among publication venues that are part of the Association of Computational Linguistics. Taken together, these studies illustrate the utility of finding leaders of language change to gain insights in sociolinguistics and for applications in social science and digital humanities.Ph.D
    • …
    corecore