57 research outputs found
Self-Supervised Pretraining for Heterogeneous Hypergraph Neural Networks
Recently, pretraining methods for the Graph Neural Networks (GNNs) have been
successful at learning effective representations from unlabeled graph data.
However, most of these methods rely on pairwise relations in the graph and do
not capture the underling higher-order relations between entities. Hypergraphs
are versatile and expressive structures that can effectively model higher-order
relationships among entities in the data. Despite the efforts to adapt GNNs to
hypergraphs (HyperGNN), there are currently no fully self-supervised
pretraining methods for HyperGNN on heterogeneous hypergraphs. In this paper,
we present SPHH, a novel self-supervised pretraining framework for
heterogeneous HyperGNNs. Our method is able to effectively capture higher-order
relations among entities in the data in a self-supervised manner. SPHH is
consist of two self-supervised pretraining tasks that aim to simultaneously
learn both local and global representations of the entities in the hypergraph
by using informative representations derived from the hypergraph structure.
Overall, our work presents a significant advancement in the field of
self-supervised pretraining of HyperGNNs, and has the potential to improve the
performance of various graph-based downstream tasks such as node classification
and link prediction tasks which are mapped to hypergraph configuration. Our
experiments on two real-world benchmarks using four different HyperGNN models
show that our proposed SPHH framework consistently outperforms state-of-the-art
baselines in various downstream tasks. The results demonstrate that SPHH is
able to improve the performance of various HyperGNN models in various
downstream tasks, regardless of their architecture or complexity, which
highlights the robustness of our framework
Is Meta-Learning the Right Approach for the Cold-Start Problem in Recommender Systems?
Recommender systems have become fundamental building blocks of modern online
products and services, and have a substantial impact on user experience. In the
past few years, deep learning methods have attracted a lot of research, and are
now heavily used in modern real-world recommender systems. Nevertheless,
dealing with recommendations in the cold-start setting, e.g., when a user has
done limited interactions in the system, is a problem that remains far from
solved. Meta-learning techniques, and in particular optimization-based
meta-learning, have recently become the most popular approaches in the academic
research literature for tackling the cold-start problem in deep learning models
for recommender systems. However, current meta-learning approaches are not
practical for real-world recommender systems, which have billions of users and
items, and strict latency requirements. In this paper we show that it is
possible to obtaining similar, or higher, performance on commonly used
benchmarks for the cold-start problem without using meta-learning techniques.
In more detail, we show that, when tuned correctly, standard and widely adopted
deep learning models perform just as well as newer meta-learning models. We
further show that an extremely simple modular approach using common
representation learning techniques, can perform comparably to meta-learning
techniques specifically designed for the cold-start setting while being much
more easily deployable in real-world applications
Concept Matching for Low-Resource Classification
We propose a model to tackle classification tasks in the presence of very
little training data. To this aim, we approximate the notion of exact match
with a theoretically sound mechanism that computes a probability of matching in
the input space. Importantly, the model learns to focus on elements of the
input that are relevant for the task at hand; by leveraging highlighted
portions of the training data, an error boosting technique guides the learning
process. In practice, it increases the error associated with relevant parts of
the input by a given factor. Remarkable results on text classification tasks
confirm the benefits of the proposed approach in both balanced and unbalanced
cases, thus being of practical use when labeling new examples is expensive. In
addition, by inspecting its weights, it is often possible to gather insights on
what the model has learned
How Decoding Strategies Affect the Verifiability of Generated Text
Recent progress in pre-trained language models led to systems that are able
to generate text of an increasingly high quality. While several works have
investigated the fluency and grammatical correctness of such models, it is
still unclear to which extent the generated text is consistent with factual
world knowledge. Here, we go beyond fluency and also investigate the
verifiability of text generated by state-of-the-art pre-trained language
models. A generated sentence is verifiable if it can be corroborated or
disproved by Wikipedia, and we find that the verifiability of generated text
strongly depends on the decoding strategy. In particular, we discover a
tradeoff between factuality (i.e., the ability of generating Wikipedia
corroborated text) and repetitiveness. While decoding strategies such as top-k
and nucleus sampling lead to less repetitive generations, they also produce
less verifiable text. Based on these finding, we introduce a simple and
effective decoding strategy which, in comparison to previously used decoding
strategies, produces less repetitive and more verifiable text.Comment: accepted at Findings of EMNLP 202
How Decoding Strategies Affect the Verifiability of Generated Text
Recent progress in pre-trained language models led to systems that are able to generate text of an increasingly high quality. While several works have investigated the fluency and grammatical correctness of such models, it is still unclear to which extent the generated text is consistent with factual world knowledge. Here, we go beyond fluency and also investigate the verifiability of text generated by state-of-the-art pre-trained language models. A generated sentence is verifiable if it can be corroborated or disproved by Wikipedia, and we find that the verifiability of generated text strongly depends on the decoding strategy. In particular, we discover a tradeoff between factuality (i.e., the ability of generating Wikipedia corroborated text) and repetitiveness. While decoding strategies such as top-k and nucleus sampling lead to less repetitive generations, they also produce less verifiable text. Based on these finding, we introduce a simple and effective decoding strategy which, in comparison to previously used decoding strategies, produces less repetitive and more verifiable text
KILT: a Benchmark for Knowledge Intensive Language Tasks
Challenging problems such as open-domain question answering, fact checking,
slot filling and entity linking require access to large, external knowledge
sources. While some models do well on individual tasks, developing general
models is difficult as each task might require computationally expensive
indexing of custom knowledge sources, in addition to dedicated infrastructure.
To catalyze research on models that condition on specific information in large
textual resources, we present a benchmark for knowledge-intensive language
tasks (KILT). All tasks in KILT are grounded in the same snapshot of Wikipedia,
reducing engineering turnaround through the re-use of components, as well as
accelerating research into task-agnostic memory architectures. We test both
task-specific and general baselines, evaluating downstream performance in
addition to the ability of the models to provide provenance. We find that a
shared dense vector index coupled with a seq2seq model is a strong baseline,
outperforming more tailor-made approaches for fact checking, open-domain
question answering and dialogue, and yielding competitive results on entity
linking and slot filling, by generating disambiguated text. KILT data and code
are available at https://github.com/facebookresearch/KILT.Comment: accepted at NAACL 202
ARCOMEM Crawling Architecture
The World Wide Web is the largest information repository available today. However, this information is very volatile and Web archiving is essential to preserve it for the future. Existing approaches to Web archiving are based on simple definitions of the scope of Web pages to crawl and are limited to basic interactions with Web servers. The aim of the ARCOMEM project is to overcome these limitations and to provide flexible, adaptive and intelligent content acquisition, relying on social media to create topical Web archives. In this article, we focus on ARCOMEM’s crawling architecture. We introduce the overall architecture and we describe its modules, such as the online analysis module, which computes a priority for the Web pages to be crawled, and the Application-Aware Helper which takes into account the type of Web sites and applications to extract structure from crawled content. We also describe a large-scale distributed crawler that has been developed, as well as the modifications we have implemented to adapt Heritrix, an open source crawler, to the needs of the project. Our experimental results from real crawls show that ARCOMEM’s crawling architecture is effective in acquiring focused information about a topic and leveraging the information from social media
- …