150,119 research outputs found
From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Current state-of-the-art models for natural language understanding require a
preprocessing step to convert raw text into discrete tokens. This process known
as tokenization relies on a pre-built vocabulary of words or sub-word
morphemes. This fixed vocabulary limits the model's robustness to spelling
errors and its capacity to adapt to new domains. In this work, we introduce a
novel open-vocabulary language model that adopts a hierarchical two-level
approach: one at the word level and another at the sequence level. Concretely,
we design an intra-word module that uses a shallow Transformer architecture to
learn word representations from their characters, and a deep inter-word
Transformer module that contextualizes each word representation by attending to
the entire word sequence. Our model thus directly operates on character
sequences with explicit awareness of word boundaries, but without biased
sub-word or word-level vocabulary. Experiments on various downstream tasks show
that our method outperforms strong baselines. We also demonstrate that our
hierarchical model is robust to textual corruption and domain shift.Comment: Accepted to ACL 2023 Main Conferenc
Investigating Linguistic Pattern Ordering in Hierarchical Natural Language Generation
Natural language generation (NLG) is a critical component in spoken dialogue
system, which can be divided into two phases: (1) sentence planning: deciding
the overall sentence structure, (2) surface realization: determining specific
word forms and flattening the sentence structure into a string. With the rise
of deep learning, most modern NLG models are based on a sequence-to-sequence
(seq2seq) model, which basically contains an encoder-decoder structure; these
NLG models generate sentences from scratch by jointly optimizing sentence
planning and surface realization. However, such simple encoder-decoder
architecture usually fail to generate complex and long sentences, because the
decoder has difficulty learning all grammar and diction knowledge well. This
paper introduces an NLG model with a hierarchical attentional decoder, where
the hierarchy focuses on leveraging linguistic knowledge in a specific order.
The experiments show that the proposed method significantly outperforms the
traditional seq2seq model with a smaller model size, and the design of the
hierarchical attentional decoder can be applied to various NLG systems.
Furthermore, different generation strategies based on linguistic patterns are
investigated and analyzed in order to guide future NLG research work.Comment: accepted by the 7th IEEE Workshop on Spoken Language Technology (SLT
2018). arXiv admin note: text overlap with arXiv:1808.0274
- …