Search CORE

283 research outputs found

TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning

Author: Faloutsos Christos
Ioannidis Vassilis N.
Koutra Danai
Song Xiang
Zhu Jing
Publication venue
Publication date: 25/09/2023
Field of study

How can we enhance the node features acquired from Pretrained Models (PMs) to better suit downstream graph learning tasks? Graph Neural Networks (GNNs) have become the state-of-the-art approach for many high-impact, real-world graph applications. For feature-rich graphs, a prevalent practice involves utilizing a PM directly to generate features, without incorporating any domain adaptation techniques. Nevertheless, this practice is suboptimal because the node features extracted from PM are graph-agnostic and prevent GNNs from fully utilizing the potential correlations between the graph structure and node features, leading to a decline in GNNs performance. In this work, we seek to improve the node features obtained from a PM for downstream graph tasks and introduce TOUCHUP-G, which has several advantages. It is (a) General: applicable to any downstream graph task, including link prediction which is often employed in recommender systems; (b) Multi-modal: able to improve raw features of any modality (e.g. images, texts, audio); (c) Principled: it is closely related to a novel metric, feature homophily, which we propose to quantify the potential correlations between the graph structure and node features and we show that TOUCHUP-G can effectively shrink the discrepancy between the graph structure and node features; (d) Effective: achieving state-of-the-art results on four real-world datasets spanning different tasks and modalities.Comment: preprint, ongoing wor

arXiv.org e-Print Archive

Local Citation Recommendation with Hierarchical-Attention Text Encoder and SciBERT-based Reranking

Author: Gao Yingqiang
Gu Nianlong
Hahnloser Richard H R
Publication venue
Publication date: 01/01/2021
Field of study

The goal of local citation recommendation is to recommend a missing reference from the local citation context and optionally also from the global context. To balance the tradeoff between speed and accuracy of citation recommendation in the context of a large-scale paper database, a viable approach is to first prefetch a limited number of relevant documents using efficient ranking methods and then to perform a fine-grained reranking using more sophisticated models. In that vein, BM25 has been found to be a tough-to-beat approach to prefetching, which is why recent work has focused mainly on the reranking step. Even so, we explore prefetching with nearest neighbor search among text embeddings constructed by a hierarchical attention network. When coupled with a SciBERT reranker fine-tuned on local citation recommendation tasks, our hierarchical Attention encoder (HAtten) achieves high prefetch recall for a given number of candidates to be reranked. Consequently, our reranker requires fewer prefetch candidates to rerank, yet still achieves state-of-the-art performance on various local citation recommendation datasets such as ACL-200, FullTextPeerRead, RefSeer, and arXiv

ZORA

Latent space transformers for generalizing deep networks

Author: Bernardo L.
Campos L. M.
Farkhari H.
Mehta P. L.
Mihovska A.
Nidhi
Sebastião P.
Viana J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Sharing information between deep networks is not a simple task nowadays. In a traditional approach, researchers change and train layers at the end of a pretrained deep network while the other layers remain the same to adapt it to their purposes or develop a new deep network. In this paper, we propose a novel concept for interoperability in deep networks. Generalizing such networks’ usability will facilitate the creation of new hybrid models promoting innovation and disruptive use cases for deep networks in the fifth generation of wireless communications (5G) networks and increasing the accessibility, usability, and affordability for these products. The main idea is to use standard latent space transformation to share information between such networks. First, each deep network should be split into two parts by creators. After that, they should provide access to standard latent space. As each deep network should do that, we suggest the standard for the procedure. By adding the latent space, we can combine two deep networks using the latent transformer block, the only block that needs to train while connecting different pretrained deep networks. The results from the combination create a new network with a unique ability. This paper contributes to a concept related to the generalization of deep networks using latent transformers, optimizing the utilization of the edge and cloud in 5G telecommunication, controlling load balancing, saving bandwidth, and decreasing the latency caused by cumbersome computations. We provide a review of the current standardization associated with deep networks and Artificial Intelligence in general. Lastly, we present some use cases in 5G supporting the proposed concept.info:eu-repo/semantics/acceptedVersio

Repositório Institucional do ISCTE-IUL

Representation Learning for Texts and Graphs: A Unified Perspective on Efficiency, Multimodality, and Adaptability

Author: Galke Lukas Paul Achatius
Publication venue: Universitatsbibliothek Kiel
Publication date: 01/01/2023
Field of study

[...] This thesis is situated between natural language processing and graph representation learning and investigates selected connections. First, we introduce matrix embeddings as an efficient text representation sensitive to word order. [...] Experiments with ten linguistic probing tasks, 11 supervised, and five unsupervised downstream tasks reveal that vector and matrix embeddings have complementary strengths and that a jointly trained hybrid model outperforms both. Second, a popular pretrained language model, BERT, is distilled into matrix embeddings. [...] The results on the GLUE benchmark show that these models are competitive with other recent contextualized language models while being more efficient in time and space. Third, we compare three model types for text classification: bag-of-words, sequence-, and graph-based models. Experiments on five datasets show that, surprisingly, a wide multilayer perceptron on top of a bag-of-words representation is competitive with recent graph-based approaches, questioning the necessity of graphs synthesized from the text. [...] Fourth, we investigate the connection between text and graph data in document-based recommender systems for citations and subject labels. Experiments on six datasets show that the title as side information improves the performance of autoencoder models. [...] We find that the meaning of item co-occurrence is crucial for the choice of input modalities and an appropriate model. Fifth, we introduce a generic framework for lifelong learning on evolving graphs in which new nodes, edges, and classes appear over time. [...] The results show that by reusing previous parameters in incremental training, it is possible to employ smaller history sizes with only a slight decrease in accuracy compared to training with complete history. Moreover, weighting the binary cross-entropy loss function is crucial to mitigate the problem of class imbalance when detecting newly emerging classes. [...

MACAU: Open Access Repository of Kiel University

Recommended from our members

Modeling the Multi-mode Distribution in Self-Supervised Language Models

Author: Chang Haw-Shiuan
Publication venue: ScholarWorks@UMass Amherst
Publication date: 26/10/2022
Field of study

Self-supervised large language models (LMs) have become a highly-influential and foundational tool for many NLP models. For this reason, their expressivity is an important topic of study. In near-universal practice, given the language context, the model predicts a word from the vocabulary using a single embedded vector representation of both context and dictionary entries. Note that the context sometimes implies that the distribution over predicted words should be multi-modal in embedded space. However, the context’s single-vector representation provably fails to capture such a distribution. To address this limitation, we propose to represent context with multiple vector embeddings, which we term facets. This is distinct from previous work on multi-sense vocabulary embeddings, which employs multiple vectors for the dictionary entries, not the context. In this dissertation, we first present the theoretical limitations of the single context embedding in LMs and how the theoretical analyses suggest new alternative softmax layers that encode a context as multiple embeddings. The proposed alternatives achieve better perplexity than the mixture of softmax (MoS), especially given an ambiguous context, without adding significant computational cost to LMs. Our approaches also let GPT-2 learn to properly copy the entities from the context, which increases the coherence of the generated text without requiring any labels. In addition to predicting the next word, we also use multiple CLS embeddings to improve state-of-the-art pretraining methods for BERT on natural language understanding (NLU) benchmarks without introducing significant extra parameters or computations, especially when the training datasets are small. Furthermore, we show that our multi-facet embeddings improve the sequential recommendation, scientific paper embeddings, measurement of sentence similarity, distantly supervised relation extraction, unsupervised text pattern entailment detection, and cold-start citation recommendation. Finally, we use the multiple vector embeddings to predict the future topics of a context, and build on the basis, we propose a novel interactive language generation framework

ScholarWorks@UMass Amherst

Towards Knowledge-Based Personalized Product Description Generation in E-commerce

Author: Asghar Nabiha
Devlin Jacob
Gehring Jonas
Kingma Diederik P
Lample Guillaume
Li Jiwei
Li Jiwei
Nair Vinod
Sun Maosong
van der Maaten Laurens
Vaswani Ashish
Wang Jinpeng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/06/2019
Field of study

Quality product descriptions are critical for providing competitive customer experience in an e-commerce platform. An accurate and attractive description not only helps customers make an informed decision but also improves the likelihood of purchase. However, crafting a successful product description is tedious and highly time-consuming. Due to its importance, automating the product description generation has attracted considerable interests from both research and industrial communities. Existing methods mainly use templates or statistical methods, and their performance could be rather limited. In this paper, we explore a new way to generate the personalized product description by combining the power of neural networks and knowledge base. Specifically, we propose a KnOwledge Based pErsonalized (or KOBE) product description generation model in the context of e-commerce. In KOBE, we extend the encoder-decoder framework, the Transformer, to a sequence modeling formulation using self-attention. In order to make the description both informative and personalized, KOBE considers a variety of important factors during text generation, including product aspects, user categories, and knowledge base, etc. Experiments on real-world datasets demonstrate that the proposed method out-performs the baseline on various metrics. KOBE can achieve an improvement of 9.7% over state-of-the-arts in terms of BLEU. We also present several case studies as the anecdotal evidence to further prove the effectiveness of the proposed approach. The framework has been deployed in Taobao, the largest online e-commerce platform in China.Comment: KDD 2019 Camera-ready. Website: https://sites.google.com/view/kobe201

arXiv.org e-Print Archive

Crossref

Evaluating the Performance of Transformer architecture over Attention architecture on Image Captioning

Author: Balasubramaniam Deepti
Publication venue: Technological University Dublin
Publication date: 01/01/2021
Field of study

Over the last few decades computer vision and Natural Language processing has shown tremendous improvement in different tasks such as image captioning, video captioning, machine translation etc using deep learning models. However, there were not much researches related to image captioning based on transformers and how it outperforms other models that were implemented for image captioning. In this study will be designing a simple encoder-decoder model, attention model and transformer model for image captioning using Flickr8K dataset where will be discussing about the hyperparameters of the model, type of pre-trained model used and how long the model has been trained. Furthermore, will be comparing the captions generated by attention model and transformer model using BLEU score metrics, which will be further analysed using human evaluation conducted using intrinsic approach. After analysis of results obtained using statistical test conducted on BLEU score metrics and human evaluation it was found that transformer model with multi-head attention has outperformed attention model in image captioning

Arrow@TUDublin

OpenMSD: Towards Multilingual Scientific Documents Similarity Measurement

Author: Alon Dana
Gao Yang
Hall Keith
Korotkov Ivan
Ma Ji
Metzler Don
Publication venue
Publication date: 19/09/2023
Field of study

We develop and evaluate multilingual scientific documents similarity measurement models in this work. Such models can be used to find related works in different languages, which can help multilingual researchers find and explore papers more efficiently. We propose the first multilingual scientific documents dataset, Open-access Multilingual Scientific Documents (OpenMSD), which has 74M papers in 103 languages and 778M citation pairs. With OpenMSD, we pretrain science-specialized language models, and explore different strategies to derive "related" paper pairs to fine-tune the models, including using a mixture of citation, co-citation, and bibliographic-coupling pairs. To further improve the models' performance for non-English papers, we explore the use of generative language models to enrich the non-English papers with English summaries. This allows us to leverage the models' English capabilities to create better representations for non-English papers. Our best model significantly outperforms strong baselines by 7-16% (in mean average precision).Comment: Scripts for constructing the OpenMSD dataset is available at: https://github.com/google-research/google-research/tree/master/OpenMS

arXiv.org e-Print Archive