121 research outputs found
Disentangled Phonetic Representation for Chinese Spelling Correction
Chinese Spelling Correction (CSC) aims to detect and correct erroneous
characters in Chinese texts. Although efforts have been made to introduce
phonetic information (Hanyu Pinyin) in this task, they typically merge phonetic
representations with character representations, which tends to weaken the
representation effect of normal texts. In this work, we propose to disentangle
the two types of features to allow for direct interaction between textual and
phonetic information. To learn useful phonetic representations, we introduce a
pinyin-to-character objective to ask the model to predict the correct
characters based solely on phonetic information, where a separation mask is
imposed to disable attention from phonetic input to text. To avoid overfitting
the phonetics, we further design a self-distillation module to ensure that
semantic information plays a major role in the prediction. Extensive
experiments on three CSC benchmarks demonstrate the superiority of our method
in using phonetic information.Comment: Accepted to ACL 2023 Main Conferenc
Autoregressive Entity Generation for End-to-End Task-Oriented Dialog
Task-oriented dialog (TOD) systems often require interaction with an external
knowledge base to retrieve necessary entity (e.g., restaurant) information to
support the response generation. Most current end-to-end TOD systems either
retrieve the KB information explicitly or embed it into model parameters for
implicit access.~While the former approach demands scanning the KB at each turn
of response generation, which is inefficient when the KB scales up, the latter
approach shows higher flexibility and efficiency. In either approach, the
systems may generate a response with conflicting entity information. To address
this issue, we propose to generate the entity autoregressively first and
leverage it to guide the response generation in an end-to-end system. To ensure
entity consistency, we impose a trie constraint on entity generation. We also
introduce a logit concatenation strategy to facilitate gradient backpropagation
for end-to-end training. Experiments on MultiWOZ 2.1 single and CAMREST show
that our system can generate more high-quality and entity-consistent responses.Comment: Accepted to COLING 202
DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition
This paper presents our pioneering effort for emotion recognition in
conversation (ERC) with pre-trained language models. Unlike regular documents,
conversational utterances appear alternately from different parties and are
usually organized as hierarchical structures in previous work. Such structures
are not conducive to the application of pre-trained language models such as
XLNet. To address this issue, we propose an all-in-one XLNet model, namely
DialogXL, with enhanced memory to store longer historical context and
dialog-aware self-attention to deal with the multi-party structures.
Specifically, we first modify the recurrence mechanism of XLNet from
segment-level to utterance-level in order to better model the conversational
data. Second, we introduce dialog-aware self-attention in replacement of the
vanilla self-attention in XLNet to capture useful intra- and inter-speaker
dependencies. Extensive experiments are conducted on four ERC benchmarks with
mainstream models presented for comparison. The experimental results show that
the proposed model outperforms the baselines on all the datasets. Several other
experiments such as ablation study and error analysis are also conducted and
the results confirm the role of the critical modules of DialogXL.Comment: Accepted by AAAI 2021 main conferenc
- …