870 research outputs found
Joint Syntacto-Discourse Parsing and the Syntacto-Discourse Treebank
Discourse parsing has long been treated as a stand-alone problem independent
from constituency or dependency parsing. Most attempts at this problem are
pipelined rather than end-to-end, sophisticated, and not self-contained: they
assume gold-standard text segmentations (Elementary Discourse Units), and use
external parsers for syntactic features. In this paper we propose the first
end-to-end discourse parser that jointly parses in both syntax and discourse
levels, as well as the first syntacto-discourse treebank by integrating the
Penn Treebank with the RST Treebank. Built upon our recent span-based
constituency parser, this joint syntacto-discourse parser requires no
preprocessing whatsoever (such as segmentation or feature extraction), achieves
the state-of-the-art end-to-end discourse parsing accuracy.Comment: Accepted at EMNLP 201
Two Approaches for Building an Unsupervised Dependency Parser and their Other Applications
Much work has been done on building a parser for natural languages, but most of this work has concentrated on supervised parsing. Unsupervised parsing is a less explored area, and unsupervised dependency parser has hardly been tried. In this paper we present two approaches for building an unsupervised dependency parser. One approach is based on learning dependency relations and the other on learning subtrees. We also propose some other applications of these approaches
Latent Tree Language Model
In this paper we introduce Latent Tree Language Model (LTLM), a novel
approach to language modeling that encodes syntax and semantics of a given
sentence as a tree of word roles.
The learning phase iteratively updates the trees by moving nodes according to
Gibbs sampling. We introduce two algorithms to infer a tree for a given
sentence. The first one is based on Gibbs sampling. It is fast, but does not
guarantee to find the most probable tree. The second one is based on dynamic
programming. It is slower, but guarantees to find the most probable tree. We
provide comparison of both algorithms.
We combine LTLM with 4-gram Modified Kneser-Ney language model via linear
interpolation. Our experiments with English and Czech corpora show significant
perplexity reductions (up to 46% for English and 49% for Czech) compared with
standalone 4-gram Modified Kneser-Ney language model.Comment: Accepted to EMNLP 201
Learning Grammars for Architecture-Specific Facade Parsing
International audienceParsing facade images requires optimal handcrafted grammar for a given class of buildings. Such a handcrafted grammar is often designed manually by experts. In this paper, we present a novel framework to learn a compact grammar from a set of ground-truth images. To this end, parse trees of ground-truth annotated images are obtained running existing inference algorithms with a simple, very general grammar. From these parse trees, repeated subtrees are sought and merged together to share derivations and produce a grammar with fewer rules. Furthermore, unsupervised clustering is performed on these rules, so that, rules corresponding to the same complex pattern are grouped together leading to a rich compact grammar. Experimental validation and comparison with the state-of-the-art grammar-based methods on four diff erent datasets show that the learned grammar helps in much faster convergence while producing equal or more accurate parsing results compared to handcrafted grammars as well as grammars learned by other methods. Besides, we release a new dataset of facade images from Paris following the Art-deco style and demonstrate the general applicability and extreme potential of the proposed framework
- …