951 research outputs found
Paraphrastic language models
Natural languages are known for their expressive richness. Many sentences can be used to represent the same underlying meaning.
Only modelling the observed surface word sequence can result in poor context coverage and generalization, for example, when using
n-gram language models (LMs). This paper proposes a novel form of language model, the paraphrastic LM, that addresses these
issues. A phrase level paraphrase model statistically learned from standard text data with no semantic annotation is used to generate
multiple paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Multi-level language
models estimated at both the word level and the phrase level are combined. An efficient weighted finite state transducer (WFST)
based paraphrase generation approach is also presented. Significant error rate reductions of 0.5–0.6% absolute were obtained over the
baseline n-gram LMs on two state-of-the-art recognition tasks for English conversational telephone speech and Mandarin Chinese
broadcast speech using a paraphrastic multi-level LM modelling both word and phrase sequences. When it is further combined with
word and phrase level feed-forward neural network LMs, a significant error rate reduction of 0.9% absolute (9% relative) and 0.5%
absolute (5% relative) were obtained over the baseline n-gram and neural network LMs respectivelyThe research leading to these results was supported by EPSRC grant EP/I031022/1 (Natural Speech Technology)
and DARPA under the Broad Operational Language Translation (BOLT) program.This version is the author accepted manuscript. The final published version can be found on the publisher's website at:http://www.sciencedirect.com/science/article/pii/S088523081400028X# © 2014 Elsevier Ltd. All rights reserved
Paraphrastic neural network language models
Expressive richness in natural languages presents a significant challenge for statistical language models (LM). As multiple word sequences can represent the same underlying meaning, only modelling the observed surface word sequence can lead to poor context coverage. To handle this issue, paraphrastic LMs were previously proposed to improve the generalization of back-off n-gram LMs. Paraphrastic neural network LMs (NNLM) are investigated in this paper. Using a paraphrastic multi-level feedforward NNLM modelling both word and phrase sequences, significant error rate reductions of 1.3% absolute (8% relative) and 0.9% absolute (5.5% relative) were obtained over the baseline n-gram and NNLM systems respectively on a state-of-the-art conversational telephone speech recognition system trained on 2000 hours of audio and 545 million words of texts.The research leading to these results was supported by EPSRC grant
EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad
Operational Language Translation (BOLT) program.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ICASSP.2014.685453
Paraphrastic recurrent neural network language models
Recurrent neural network language models (RNNLM) have become an increasingly popular choice for state-of-the-art speech recognition systems. Linguistic factors influencing the realization of surface word sequences, for example, expressive richness, are only implicitly learned by RNNLMs. Observed sentences and their associated alternative paraphrases representing the same meaning are not explicitly related during training. In order to improve context coverage and generalization, paraphrastic RNNLMs are investigated in this paper. Multiple paraphrase variants were automatically generated and used in paraphrastic RNNLM training. Using a paraphrastic multi-level RNNLM modelling both word and phrase sequences, significant error rate reductions of 0.6% absolute and perplexity reduction of 10% relative were obtained over the baseline RNNLM on a large vocabulary conversational telephone speech recognition system trained on 2000 hours of audio and 545 million words of texts. The overall improvement over the baseline n-gram LM was increased from 8.4% to 11.6% relative.The research leading to these results was supported by EPSRC grant EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad Operational Language Translation (BOLT) and RATS programs. The paper does not necessarily reflect the position or the policy of US Government and no official endorsement should be inferred. Xie Chen is supported by Toshiba Research Europe Ltd, Cambridge Research Lab.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ICASSP.2015.717900
Recommended from our members
Paraphrastic language models and combination with neural network language models
In natural languages multiple word sequences can represent the same
underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage, for example, when using n-gram language models (LM). To handle this issue, paraphrastic LMs were proposed in previous research and successfully applied to a US English conversational telephone speech transcription
task. In order to exploit the complementary characteristics of paraphrastic LMs and neural network LMs (NNLM), the combination
between the two is investigated in this paper. To investigate paraphrastic LMs’ generalization ability to other languages, experiments
are conducted on a Mandarin Chinese broadcast speech transcription task. Using a paraphrastic multi-level LM modelling both word
and phrase sequences, significant error rate reductions of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained
over the baseline n-gram and NNLM systems respectively, after a
combination with word and phrase level NNLMs.The research leading to these results was supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology)This is the author accepted manuscript. The final version is available at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6639308
On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference
We propose a process for investigating the extent to which sentence
representations arising from neural machine translation (NMT) systems encode
distinct semantic phenomena. We use these representations as features to train
a natural language inference (NLI) classifier based on datasets recast from
existing semantic annotations. In applying this process to a representative NMT
system, we find its encoder appears most suited to supporting inferences at the
syntax-semantics interface, as compared to anaphora resolution requiring
world-knowledge. We conclude with a discussion on the merits and potential
deficiencies of the existing process, and how it may be improved and extended
as a broader framework for evaluating semantic coverage.Comment: To be presented at NAACL 2018 - 11 page
ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations
We describe PARANMT-50M, a dataset of more than 50 million English-English
sentential paraphrase pairs. We generated the pairs automatically by using
neural machine translation to translate the non-English side of a large
parallel corpus, following Wieting et al. (2017). Our hope is that ParaNMT-50M
can be a valuable resource for paraphrase generation and can provide a rich
source of semantic knowledge to improve downstream natural language
understanding tasks. To show its utility, we use ParaNMT-50M to train
paraphrastic sentence embeddings that outperform all supervised systems on
every SemEval semantic textual similarity competition, in addition to showing
how it can be used for paraphrase generation
Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings
We consider the problem of learning general-purpose, paraphrastic sentence
embeddings, revisiting the setting of Wieting et al. (2016b). While they found
LSTM recurrent networks to underperform word averaging, we present several
developments that together produce the opposite conclusion. These include
training on sentence pairs rather than phrase pairs, averaging states to
represent sequences, and regularizing aggressively. These improve LSTMs in both
transfer learning and supervised settings. We also introduce a new recurrent
architecture, the Gated Recurrent Averaging Network, that is inspired by
averaging and LSTMs while outperforming them both. We analyze our learned
models, finding evidence of preferences for particular parts of speech and
dependency relations.Comment: Published as a long paper at ACL 201
It Is Not Easy To Detect Paraphrases : Analysing Semantic Similarity With Antonyms and Negation Using the New SemAntoNeg Benchmark
We investigate to what extent a hundred publicly available, popular neural language models capture meaning systematically. Sentence embeddings obtained from pretrained or fine-tuned language models can be used to perform particular tasks, such as paraphrase detection, semantic textual similarity assessment or natural language inference. Common to all of these tasks is that paraphrastic sentences, that is, sentences that carry (nearly) the same meaning, should have (nearly) the same embeddings regardless of surface form.We demonstrate that performance varies greatly across different language models when a specific type of meaning-preserving transformation is applied: two sentences should be identified as paraphrastic if one of them contains a negated antonym in relation to the other one, such as “I am not guilty” versus “I am innocent”.We introduce and release SemAntoNeg, a new test suite containing 3152 entries for probing paraphrasticity in sentences incorporating negation and antonyms. Among other things, we show that language models fine-tuned for natural language inference outperform other types of models, especially the ones fine-tuned to produce general-purpose sentence embeddings, on the test suite. Furthermore, we show that most models designed explicitly for paraphrasing are rather mediocre in our task.Peer reviewe
Automatic extraction of paraphrastic phrases from medium size corpora
This paper presents a versatile system intended to acquire paraphrastic
phrases from a representative corpus. In order to decrease the time spent on
the elaboration of resources for NLP system (for example Information
Extraction, IE hereafter), we suggest to use a machine learning system that
helps defining new templates and associated resources. This knowledge is
automatically derived from the text collection, in interaction with a large
semantic network
- …