75 research outputs found
Grammatical information in BERT sentence embeddings as two-dimensional arrays
Sentence embeddings induced with various transformer architectures encode
much semantic and syntactic information in a distributed manner in a
one-dimensional array. We investigate whether specific grammatical information
can be accessed in these distributed representations. Using data from a task
developed to test rule-like generalizations, our experiments on detecting
subject-verb agreement yield several promising results. First, we show that
while the usual sentence representations encoded as one-dimensional arrays do
not easily support extraction of rule-like regularities, a two-dimensional
reshaping of these vectors allows various learning architectures to access such
information. Next, we show that various architectures can detect patterns in
these two-dimensional reshaped sentence embeddings and successfully learn a
model based on smaller amounts of simpler training data, which performs well on
more complex test data. This indicates that current sentence embeddings contain
information that is regularly distributed, and which can be captured when the
embeddings are reshaped into higher dimensional arrays. Our results cast light
on representations produced by language models and help move towards developing
few-shot learning approaches.Comment: Published in RepL4NLP 202
Recommended from our members
Finding Semantic Associations on Express Lane
This paper introduces a new codification scheme for efficient computation of measures in semantic networks. The scheme is particularly useful for fast computation of semantic associations between words and implementation of an informational retrieval operator for efficient search in semantic spaces. Other applications may also be possible
Recommended from our members
Building Multilingual Semantic Networks with Non-Expert Contributions over the Web
This paper discusses building multilingual semantic networks
Taxonomy Induction using Hypernym Subsequences
We propose a novel, semi-supervised approach towards domain taxonomy
induction from an input vocabulary of seed terms. Unlike all previous
approaches, which typically extract direct hypernym edges for terms, our
approach utilizes a novel probabilistic framework to extract hypernym
subsequences. Taxonomy induction from extracted subsequences is cast as an
instance of the minimumcost flow problem on a carefully designed directed
graph. Through experiments, we demonstrate that our approach outperforms
stateof- the-art taxonomy induction approaches across four languages.
Importantly, we also show that our approach is robust to the presence of noise
in the input vocabulary. To the best of our knowledge, no previous approaches
have been empirically proven to manifest noise-robustness in the input
vocabulary
- …