4,143 research outputs found
Matching Natural Language Sentences with Hierarchical Sentence Factorization
Semantic matching of natural language sentences or identifying the
relationship between two sentences is a core research problem underlying many
natural language tasks. Depending on whether training data is available, prior
research has proposed both unsupervised distance-based schemes and supervised
deep learning schemes for sentence matching. However, previous approaches
either omit or fail to fully utilize the ordered, hierarchical, and flexible
structures of language objects, as well as the interactions between them. In
this paper, we propose Hierarchical Sentence Factorization---a technique to
factorize a sentence into a hierarchical representation, with the components at
each different scale reordered into a "predicate-argument" form. The proposed
sentence factorization technique leads to the invention of: 1) a new
unsupervised distance metric which calculates the semantic distance between a
pair of text snippets by solving a penalized optimal transport problem while
preserving the logical relationship of words in the reordered sentences, and 2)
new multi-scale deep learning models for supervised semantic training, based on
factorized sentence hierarchies. We apply our techniques to text-pair
similarity estimation and text-pair relationship classification tasks, based on
multiple datasets such as STSbenchmark, the Microsoft Research paraphrase
identification (MSRP) dataset, the SICK dataset, etc. Extensive experiments
show that the proposed hierarchical sentence factorization can be used to
significantly improve the performance of existing unsupervised distance-based
metrics as well as multiple supervised deep learning models based on the
convolutional neural network (CNN) and long short-term memory (LSTM).Comment: Accepted by WWW 2018, 10 page
Factorising Meaning and Form for Intent-Preserving Paraphrasing
We propose a method for generating paraphrases of English questions that
retain the original intent but use a different surface form. Our model combines
a careful choice of training objective with a principled information
bottleneck, to induce a latent encoding space that disentangles meaning and
form. We train an encoder-decoder model to reconstruct a question from a
paraphrase with the same meaning and an exemplar with the same surface form,
leading to separated encoding spaces. We use a Vector-Quantized Variational
Autoencoder to represent the surface form as a set of discrete latent
variables, allowing us to use a classifier to select a different surface form
at test time. Crucially, our method does not require access to an external
source of target exemplars. Extensive experiments and a human evaluation show
that we are able to generate paraphrases with a better tradeoff between
semantic preservation and syntactic novelty compared to previous methods.Comment: ACL 202
Hierarchical Sketch Induction for Paraphrase Generation
We propose a generative model of paraphrase generation, that encourages
syntactic diversity by conditioning on an explicit syntactic sketch. We
introduce Hierarchical Refinement Quantized Variational Autoencoders (HRQ-VAE),
a method for learning decompositions of dense encodings as a sequence of
discrete latent variables that make iterative refinements of increasing
granularity. This hierarchy of codes is learned through end-to-end training,
and represents fine-to-coarse grained information about the input. We use
HRQ-VAE to encode the syntactic form of an input sentence as a path through the
hierarchy, allowing us to more easily predict syntactic sketches at test time.
Extensive experiments, including a human evaluation, confirm that HRQ-VAE
learns a hierarchical representation of the input space, and generates
paraphrases of higher quality than previous systems.Comment: Accepted at ACL 202
- …