Search CORE

10,334 research outputs found

Revisiting Embeddings for Graph Neural Networks

Author: Mullins R. D.
Purchase S.
Zhao A.
Publication venue
Publication date: 29/11/2022
Field of study

Current graph representation learning techniques use Graph Neural Networks (GNNs) to extract features from dataset embeddings. In this work, we examine the quality of these embeddings and assess how changing them can affect the accuracy of GNNs. We explore different embedding extraction techniques for both images and texts; and find that the performance of different GNN architectures is dependent on the embedding style used. We see a prevalence of bag of words (BoW) embeddings and text classification tasks in available graph datasets. Given the impact embeddings has on GNN performance. this leads to a phenomenon that GNNs being optimised for BoW vectors

arXiv.org e-Print Archive

Representation Learning for Texts and Graphs: A Unified Perspective on Efficiency, Multimodality, and Adaptability

Author: Galke Lukas Paul Achatius
Publication venue: Universitatsbibliothek Kiel
Publication date: 01/01/2023
Field of study

[...] This thesis is situated between natural language processing and graph representation learning and investigates selected connections. First, we introduce matrix embeddings as an efficient text representation sensitive to word order. [...] Experiments with ten linguistic probing tasks, 11 supervised, and five unsupervised downstream tasks reveal that vector and matrix embeddings have complementary strengths and that a jointly trained hybrid model outperforms both. Second, a popular pretrained language model, BERT, is distilled into matrix embeddings. [...] The results on the GLUE benchmark show that these models are competitive with other recent contextualized language models while being more efficient in time and space. Third, we compare three model types for text classification: bag-of-words, sequence-, and graph-based models. Experiments on five datasets show that, surprisingly, a wide multilayer perceptron on top of a bag-of-words representation is competitive with recent graph-based approaches, questioning the necessity of graphs synthesized from the text. [...] Fourth, we investigate the connection between text and graph data in document-based recommender systems for citations and subject labels. Experiments on six datasets show that the title as side information improves the performance of autoencoder models. [...] We find that the meaning of item co-occurrence is crucial for the choice of input modalities and an appropriate model. Fifth, we introduce a generic framework for lifelong learning on evolving graphs in which new nodes, edges, and classes appear over time. [...] The results show that by reusing previous parameters in incremental training, it is possible to employ smaller history sizes with only a slight decrease in accuracy compared to training with complete history. Moreover, weighting the binary cross-entropy loss function is crucial to mitigate the problem of class imbalance when detecting newly emerging classes. [...

MACAU: Open Access Repository of Kiel University

COVID-19 Outbreak through Tweeters\u2019 Words: Monitoring Italian Social Media Communication about COVID-19 with Text Mining and Word Embeddings

Author: Sciandra A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

In this paper we aim to analyze the Italian social media communication about COVID-19 through a Twitter dataset collected in two months. The text corpus had been studied in terms of sensitivity to the social changes that are affecting people's lives in this crisis. In addition, the results of a sentiment analysis performed by two lexicons were compared and word embedding vectors were created from the available plain texts. Following we tested the informative effectiveness of word embeddings and compared them to a bag-of-words approach in terms of text classification accuracy. First results showed a certain potential of these textual data in the description of the different phases of the outbreak. However, a different strategy is needed for a more reliable sentiment labeling, as the results proposed by the two lexicons were discordant. Finally, although presenting interesting results in terms of semantic similarity, word embeddings did not show a predictive ability higher than the frequency vectors of the terms

Crossref

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Archivio istituzionale della ricerca - Università di Padova