270,845 research outputs found
Manipulating the Label Space for In-Context Classification
After pre-training by generating the next word conditional on previous words,
the Language Model (LM) acquires the ability of In-Context Learning (ICL) that
can learn a new task conditional on the context of the given in-context
examples (ICEs). Similarly, visually-conditioned Language Modelling is also
used to train Vision-Language Models (VLMs) with ICL ability. However, such
VLMs typically exhibit weaker classification abilities compared to contrastive
learning-based models like CLIP, since the Language Modelling objective does
not directly contrast whether an object is paired with a text. To improve the
ICL of classification, using more ICEs to provide more knowledge is a
straightforward way. However, this may largely increase the selection time, and
more importantly, the inclusion of additional in-context images tends to extend
the length of the in-context sequence beyond the processing capacity of a VLM.
To alleviate these limitations, we propose to manipulate the label space of
each ICE to increase its knowledge density, allowing for fewer ICEs to convey
as much information as a larger set would. Specifically, we propose two
strategies which are Label Distribution Enhancement and Visual Descriptions
Enhancement to improve In-context classification performance on diverse
datasets, including the classic ImageNet and more fine-grained datasets like
CUB-200. Specifically, using our approach on ImageNet, we increase accuracy
from 74.70\% in a 4-shot setting to 76.21\% with just 2 shots. surpassing CLIP
by 0.67\%. On CUB-200, our method raises 1-shot accuracy from 48.86\% to
69.05\%, 12.15\% higher than CLIP. The code is given in
https://anonymous.4open.science/r/MLS_ICC
Compositional Morphology for Word Representations and Language Modelling
This paper presents a scalable method for integrating compositional
morphological representations into a vector-based probabilistic language model.
Our approach is evaluated in the context of log-bilinear language models,
rendered suitably efficient for implementation inside a machine translation
decoder by factoring the vocabulary. We perform both intrinsic and extrinsic
evaluations, presenting results on a range of languages which demonstrate that
our model learns morphological representations that both perform well on word
similarity tasks and lead to substantial reductions in perplexity. When used
for translation into morphologically rich languages with large vocabularies,
our models obtain improvements of up to 1.2 BLEU points relative to a baseline
system using back-off n-gram models.Comment: Proceedings of the 31st International Conference on Machine Learning
(ICML
Improving Language Modelling with Noise-contrastive estimation
Neural language models do not scale well when the vocabulary is large.
Noise-contrastive estimation (NCE) is a sampling-based method that allows for
fast learning with large vocabularies. Although NCE has shown promising
performance in neural machine translation, it was considered to be an
unsuccessful approach for language modelling. A sufficient investigation of the
hyperparameters in the NCE-based neural language models was also missing. In
this paper, we showed that NCE can be a successful approach in neural language
modelling when the hyperparameters of a neural network are tuned appropriately.
We introduced the 'search-then-converge' learning rate schedule for NCE and
designed a heuristic that specifies how to use this schedule. The impact of the
other important hyperparameters, such as the dropout rate and the weight
initialisation range, was also demonstrated. We showed that appropriate tuning
of NCE-based neural language models outperforms the state-of-the-art
single-model methods on a popular benchmark
Topically Driven Neural Language Model
Language models are typically applied at the sentence level, without access
to the broader document context. We present a neural language model that
incorporates document context in the form of a topic model-like architecture,
thus providing a succinct representation of the broader document context
outside of the current sentence. Experiments over a range of datasets
demonstrate that our model outperforms a pure sentence-based model in terms of
language model perplexity, and leads to topics that are potentially more
coherent than those produced by a standard LDA topic model. Our model also has
the ability to generate related sentences for a topic, providing another way to
interpret topics.Comment: 11 pages, Proceedings of the 55th Annual Meeting of the Association
for Computational Linguistics (ACL 2017) (to appear
The Construction of Verification Models for Embedded Systems
The usefulness of verification hinges on the quality of the verification model. Verification is useful if it increases our confidence that an artefact bahaves as expected. As modelling inherently contains non-formal elements, the qualityof models cannot be captured by purely formal means. Still, we argue that modelling is not an act of irrationalism and unpredictable geniality, but follows rational arguments, that often remain implicit. In this paper we try to identify the tacit rationalism in the model construction as performed by most people doing modelling for verification. By explicating the different phases, arguments, and design decisions in the model construction, we try to develop guidelines that help to improve the process of model construction and the quality of models
- ā¦