283 research outputs found
TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning
How can we enhance the node features acquired from Pretrained Models (PMs) to
better suit downstream graph learning tasks? Graph Neural Networks (GNNs) have
become the state-of-the-art approach for many high-impact, real-world graph
applications. For feature-rich graphs, a prevalent practice involves utilizing
a PM directly to generate features, without incorporating any domain adaptation
techniques. Nevertheless, this practice is suboptimal because the node features
extracted from PM are graph-agnostic and prevent GNNs from fully utilizing the
potential correlations between the graph structure and node features, leading
to a decline in GNNs performance. In this work, we seek to improve the node
features obtained from a PM for downstream graph tasks and introduce TOUCHUP-G,
which has several advantages. It is (a) General: applicable to any downstream
graph task, including link prediction which is often employed in recommender
systems; (b) Multi-modal: able to improve raw features of any modality (e.g.
images, texts, audio); (c) Principled: it is closely related to a novel metric,
feature homophily, which we propose to quantify the potential correlations
between the graph structure and node features and we show that TOUCHUP-G can
effectively shrink the discrepancy between the graph structure and node
features; (d) Effective: achieving state-of-the-art results on four real-world
datasets spanning different tasks and modalities.Comment: preprint, ongoing wor
Local Citation Recommendation with Hierarchical-Attention Text Encoder and SciBERT-based Reranking
The goal of local citation recommendation is to recommend a missing reference from the local citation context and optionally also from the global context. To balance the tradeoff between speed and accuracy of citation recommendation in the context of a large-scale paper database, a viable approach is to first prefetch a limited number of relevant documents using efficient ranking methods and then to perform a fine-grained reranking using more sophisticated models. In that vein, BM25 has been found to be a tough-to-beat approach to prefetching, which is why recent work has focused mainly on the reranking step. Even so, we explore prefetching with nearest neighbor search among text embeddings constructed by a hierarchical attention network. When coupled with a SciBERT reranker fine-tuned on local citation recommendation tasks, our hierarchical Attention encoder (HAtten) achieves high prefetch recall for a given number of candidates to be reranked. Consequently, our reranker requires fewer prefetch candidates to rerank, yet still achieves state-of-the-art performance on various local citation recommendation datasets such as ACL-200, FullTextPeerRead, RefSeer, and arXiv
Latent space transformers for generalizing deep networks
Sharing information between deep networks is not a simple task nowadays. In a traditional approach, researchers change and train layers at the end of a pretrained deep network while the other layers remain the same to adapt it to their purposes or develop a new deep network. In this paper, we propose a novel concept for interoperability in deep networks. Generalizing such networks’ usability will facilitate the creation of new hybrid models promoting innovation and disruptive use cases for deep networks in the fifth generation of wireless communications (5G) networks and increasing the accessibility, usability, and affordability for these products. The main idea is to use standard latent space transformation to share information between such networks. First, each deep network should be split into two parts by creators. After that, they should provide access to standard latent space. As each deep network should do that, we suggest the standard for the procedure. By adding the latent space, we can combine two deep networks using the latent transformer block, the only block that needs to train while connecting different pretrained deep networks. The results from the combination create a new network with a unique ability. This paper contributes to a concept related to the generalization of deep networks using latent transformers, optimizing the utilization of the edge and cloud in 5G telecommunication, controlling load balancing, saving bandwidth, and decreasing the latency caused by cumbersome computations. We provide a review of the current standardization associated with deep networks and Artificial Intelligence in general. Lastly, we present some use cases in 5G supporting the proposed concept.info:eu-repo/semantics/acceptedVersio
Representation Learning for Texts and Graphs: A Unified Perspective on Efficiency, Multimodality, and Adaptability
[...] This thesis is situated between natural language processing and graph representation learning and investigates selected connections. First, we introduce matrix embeddings as an efficient text representation sensitive to word order. [...] Experiments with ten linguistic probing tasks, 11 supervised, and five unsupervised downstream tasks reveal that vector and matrix embeddings have complementary strengths and that a jointly trained hybrid model outperforms both. Second, a popular pretrained language model, BERT, is distilled into matrix embeddings. [...] The results on the GLUE benchmark show that these models are competitive with other recent contextualized language models while being more efficient in time and space. Third, we compare three model types for text classification: bag-of-words, sequence-, and graph-based models. Experiments on five datasets show that, surprisingly, a wide multilayer perceptron on top of a bag-of-words representation is competitive with recent graph-based approaches, questioning the necessity of graphs synthesized from the text. [...] Fourth, we investigate the connection between text and graph data in document-based recommender systems for citations and subject labels. Experiments on six datasets show that the title as side information improves the performance of autoencoder models. [...] We find that the meaning of item co-occurrence is crucial for the choice of input modalities and an appropriate model. Fifth, we introduce a generic framework for lifelong learning on evolving graphs in which new nodes, edges, and classes appear over time. [...] The results show that by reusing previous parameters in incremental training, it is possible to employ smaller history sizes with only a slight decrease in accuracy compared to training with complete history. Moreover, weighting the binary cross-entropy loss function is crucial to mitigate the problem of class imbalance when detecting newly emerging classes. [...
Recommended from our members
Modeling the Multi-mode Distribution in Self-Supervised Language Models
Self-supervised large language models (LMs) have become a highly-influential and foundational tool for many NLP models. For this reason, their expressivity is an important topic of study. In near-universal practice, given the language context, the model predicts a word from the vocabulary using a single embedded vector representation of both context and dictionary entries. Note that the context sometimes implies that the distribution over predicted words should be multi-modal in embedded space. However, the context’s single-vector representation provably fails to capture such a distribution. To address this limitation, we propose to represent context with multiple vector embeddings, which we term facets. This is distinct from previous work on multi-sense vocabulary embeddings, which employs multiple vectors for the dictionary entries, not the context.
In this dissertation, we first present the theoretical limitations of the single context embedding in LMs and how the theoretical analyses suggest new alternative softmax layers that encode a context as multiple embeddings. The proposed alternatives achieve better perplexity than the mixture of softmax (MoS), especially given an ambiguous context, without adding significant computational cost to LMs. Our approaches also let GPT-2 learn to properly copy the entities from the context, which increases the coherence of the generated text without requiring any labels.
In addition to predicting the next word, we also use multiple CLS embeddings to improve state-of-the-art pretraining methods for BERT on natural language understanding (NLU) benchmarks without introducing significant extra parameters or computations, especially when the training datasets are small. Furthermore, we show that our multi-facet embeddings improve the sequential recommendation, scientific paper embeddings, measurement of sentence similarity, distantly supervised relation extraction, unsupervised text pattern entailment detection, and cold-start citation recommendation. Finally, we use the multiple vector embeddings to predict the future topics of a context, and build on the basis, we propose a novel interactive language generation framework
Towards Knowledge-Based Personalized Product Description Generation in E-commerce
Quality product descriptions are critical for providing competitive customer
experience in an e-commerce platform. An accurate and attractive description
not only helps customers make an informed decision but also improves the
likelihood of purchase. However, crafting a successful product description is
tedious and highly time-consuming. Due to its importance, automating the
product description generation has attracted considerable interests from both
research and industrial communities. Existing methods mainly use templates or
statistical methods, and their performance could be rather limited. In this
paper, we explore a new way to generate the personalized product description by
combining the power of neural networks and knowledge base. Specifically, we
propose a KnOwledge Based pErsonalized (or KOBE) product description generation
model in the context of e-commerce. In KOBE, we extend the encoder-decoder
framework, the Transformer, to a sequence modeling formulation using
self-attention. In order to make the description both informative and
personalized, KOBE considers a variety of important factors during text
generation, including product aspects, user categories, and knowledge base,
etc. Experiments on real-world datasets demonstrate that the proposed method
out-performs the baseline on various metrics. KOBE can achieve an improvement
of 9.7% over state-of-the-arts in terms of BLEU. We also present several case
studies as the anecdotal evidence to further prove the effectiveness of the
proposed approach. The framework has been deployed in Taobao, the largest
online e-commerce platform in China.Comment: KDD 2019 Camera-ready. Website:
https://sites.google.com/view/kobe201
Evaluating the Performance of Transformer architecture over Attention architecture on Image Captioning
Over the last few decades computer vision and Natural Language processing has shown tremendous improvement in different tasks such as image captioning, video captioning, machine translation etc using deep learning models. However, there were not much researches related to image captioning based on transformers and how it outperforms other models that were implemented for image captioning. In this study will be designing a simple encoder-decoder model, attention model and transformer model for image captioning using Flickr8K dataset where will be discussing about the hyperparameters of the model, type of pre-trained model used and how long the model has been trained. Furthermore, will be comparing the captions generated by attention model and transformer model using BLEU score metrics, which will be further analysed using human evaluation conducted using intrinsic approach. After analysis of results obtained using statistical test conducted on BLEU score metrics and human evaluation it was found that transformer model with multi-head attention has outperformed attention model in image captioning
OpenMSD: Towards Multilingual Scientific Documents Similarity Measurement
We develop and evaluate multilingual scientific documents similarity
measurement models in this work. Such models can be used to find related works
in different languages, which can help multilingual researchers find and
explore papers more efficiently. We propose the first multilingual scientific
documents dataset, Open-access Multilingual Scientific Documents (OpenMSD),
which has 74M papers in 103 languages and 778M citation pairs. With OpenMSD, we
pretrain science-specialized language models, and explore different strategies
to derive "related" paper pairs to fine-tune the models, including using a
mixture of citation, co-citation, and bibliographic-coupling pairs. To further
improve the models' performance for non-English papers, we explore the use of
generative language models to enrich the non-English papers with English
summaries. This allows us to leverage the models' English capabilities to
create better representations for non-English papers. Our best model
significantly outperforms strong baselines by 7-16% (in mean average
precision).Comment: Scripts for constructing the OpenMSD dataset is available at:
https://github.com/google-research/google-research/tree/master/OpenMS
- …