12 research outputs found

    Learning Effective Embeddings for Dynamic Graphs and Quantifying Graph Embedding Interpretability

    Get PDF
    Graph representation learning has been a very active research area in recent years. The goal of graph representation learning is to generate representation vectors that accurately capture the structure and features of large graphs. This is especially important because the quality of the graph representation vectors will affect the performance of these vectors in downstream tasks such as node classification and link prediction. Many techniques have been proposed for generating effective graph representation vectors. These methods can be applied to both static and dynamic graphs. A static graph is a single fixed graph, while a dynamic graph evolves over time, and its nodes and edges can be added or deleted from the graph. We surveyed the graph embedding methods for both static and dynamic graphs. The majority of the existing graph embedding methods are developed for static graphs. Therefore, since most real-world graphs are dynamic, developing novel graph embedding methods suitable for evolving graphs is essential. This dissertation proposes three dynamic graph embedding models. In previous dynamic methods, the inputs were mainly adjacency matrices of graphs which are not memory efficient and may not capture the neighbourhood structure in graphs effectively. Therefore, we developed Dynnode2vec based on random walks using the static model Node2vec. Dynnode2vec generates node embeddings in each snapshot by initializing the current model with previous embedding vectors and training the model using a set of random walks obtained for nodes in the snapshot. Our second model, LSTM-Node2vec, is also based on random walks. This method leverages the LSTM model to capture the long-range dependencies between nodes in combination with Node2vec to generate node embeddings. Finally, inspired by the importance of substructures in the graphs, our third model TGR-Clique generates node embeddings by considering the effects of neighbours of a node in the maximal cliques containing the node. Experiments on real-world datasets demonstrate the effectiveness of our proposed methods in comparison to the state-of-the-art models. In addition, motivated by the lack of proper measures for quantifying and comparing graph embeddings interpretability, we proposed two interpretability measures for graph embeddings using the centrality properties of graphs

    Data and Methods for Reference Resolution in Different Modalities

    Get PDF
    One foundational goal of artificial intelligence is to build intelligent agents which interact with humans, and to do so, they must have the capacity to infer from human communication what concept is being referred to in a span of symbols. They should be able, like humans, to map these representations to perceptual inputs, visual or otherwise. In NLP, this problem of discovering which spans of text are referring to the same real-world entity is called Coreference Resolution. This dissertation expands this problem to go beyond text and maps concepts referred to by text spans to concepts represented in images. This dissertation also investigates the complex and hard nature of real world coreference resolution. Lastly, this dissertation expands upon the definition of references to include abstractions referred by non-contiguous text distributions. A central theme throughout this thesis is the paucity of data in solving hard problems of reference, which it addresses by designing several datasets. To investigate hard text coreference this dissertation analyses a domain of coreference heavy text, namely questions present in the trivia game of quiz bowl and creates a novel dataset. Solving quiz bowl questions requires robust coreference resolution and world knowledge, something humans possess but current models do not. This work uses distributional semantics for world knowledge. Also, this work addresses the sub-problems of coreference like mention detection. Next, to investigate complex visual representations of concepts, this dissertation uses the domain of paintings. Mapping spans of text in descriptions of paintings to regions of paintings being described by that text is a non-trivial problem because paintings are sufficiently harder than natural images. Distributional semantics are again used here. Finally, to discover prototypical concepts present in distributed rather than contiguous spans of text, this dissertation investigates a source which is rich in prototypical concepts, namely movie scripts. All movie narratives, character arcs, and character relationships, are distilled to sequences of interconnected prototypical concepts which are discovered using unsupervised deep learning models, also using distributional semantics. I conclude this dissertation by discussing potential future research in downstream tasks which can be aided by discovery of referring multi-modal concepts

    Academia/Industry DynAmics (AIDA): A knowledge Graph within the scholarly domain and its applications

    Get PDF
    Scholarly knowledge graphs are a form of knowledge representation that aims to capture and organize the information and knowledge contained in scholarly publications, such as research papers, books, patents, and datasets. Scholarly knowledge graphs can provide a comprehensive and structured view of the scholarly domain, covering various aspects such as authors, affiliations, research topics, methods, results, citations, and impact. Scholarly knowledge graphs can enable various applications and services that can facilitate and enhance scholarly communication, such as information retrieval, data analysis, recommendation systems, semantic search, and knowledge discovery. However, constructing and maintaining scholarly knowledge graphs is a challenging task that requires dealing with large-scale, heterogeneous, and dynamic data sources. Moreover, extracting and integrating the relevant information and knowledge from unstructured or semi-structured text is not trivial, as it involves natural language processing, machine learning, ontology engineering, and semantic web technologies. Furthermore, ensuring the quality and validity of the scholarly knowledge graphs is essential for their usability and reliability

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at Università degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    Analytics of student interactions: towards theory-driven, actionable insights

    Get PDF
    The field of learning analytics arose as a response to the vast quantities of data that are increasingly generated about students, their engagement with learning resources, and their learning and future career outcomes. While the field began as a collage, adopting methods and theories from a variety of disciplines, it has now become a major area of research, and has had a substantial impact on practice, policy, and decision-making. Although the field supports the collection and analysis of a wide array of data, existing work has predominantly focused on the digital traces generated through interactions with technology, learning content, and other students. Yet for any analyses to support students and teachers, the measures derived from these data must (1) offer practical and actionable insight into learning processes and outcomes, and (2) be theoretically grounded. As the field has matured, a number of challenges related to these criteria have become apparent. For instance, concerns have been raised that the literature prioritises predictive modeling over ensuring that these models are capable of informing constructive actions. Furthermore, the methodological validity of much of this work has been challenged, as a swathe of recent research has found many of these models fail to replicate to novel contexts. The work presented in this thesis addresses both of these concerns. In doing so, our research is pervaded by three key concerns: firstly, ensuring that any measures developed are both structurally valid and generalise across contexts; secondly, providing actionable insight with regards to student engagement; and finally, providing representations of student interactions that are predictive of student outcomes, namely, grades and students’ persistence in their studies. This research programme is heavily indebted to the work of Vincent Tinto, who conceptually distinguishes between the interactions students have with the academic and social domains present within their educational institution. This model has been subjected to extensive empirical validation, using a range of methods and data. For instance, while some studies have relied upon survey responses, others have used social network metrics, demographic variables, and students’ time spent in class together to evaluate Tinto’s claims. This model provides a foundation for the thesis, and the work presented may be categorised into two distinct veins aligning with the academic and social aspects of integration that Tinto proposes. These two domains, Tinto argues, continually modify a student’s goals and commitments, resulting in persistence or eventual disengagement and dropout. In the former, academic domain, we present a series of novel methodologies developed for modeling student engagement with academic resources. In doing so, we assessed how an individual student’s behaviour may be modeled using hidden Markov models (HMMs) to provide representations that enable actionable insight. However, in the face of considerable individual differences and cross-course variation, the validity of such methods may be called into question. Accordingly, ensuring that any measurements of student engagement are both structurally valid, and generalise across course contexts and disciplines became a central concern. To address this, we developed our model of student engagement using sticky-HMMs, emphasised the more interpretable insight such an approach provides compared to competing models, demonstrated its cross-course generality, and assessed its structural validity through the successful prediction of student dropout. In the social domain, a critical concern was to ensure any analyses conducted were valid. Accordingly, we assessed how the diversity of social tie definitions may undermine the validity of subsequent modeling practices. We then modeled students’ social integration using graph embedding techniques, and found that not only are student embeddings predictive of their final grades, but also of their persistence in their educational institution. In keeping with Tinto’s model, our research has focused on academic and social interactions separately, but both avenues of investigation have led to the question of student disengagement and dropout, and how this may be represented and remedied through the provision of actionable insight
    corecore