145 research outputs found
A Re-ranking Model for Dependency Parser with Recursive Convolutional Neural Network
In this work, we address the problem to model all the nodes (words or
phrases) in a dependency tree with the dense representations. We propose a
recursive convolutional neural network (RCNN) architecture to capture syntactic
and compositional-semantic representations of phrases and words in a dependency
tree. Different with the original recursive neural network, we introduce the
convolution and pooling layers, which can model a variety of compositions by
the feature maps and choose the most informative compositions by the pooling
layers. Based on RCNN, we use a discriminative model to re-rank a -best list
of candidate dependency parsing trees. The experiments show that RCNN is very
effective to improve the state-of-the-art dependency parsing on both English
and Chinese datasets
Neural Techniques for German Dependency Parsing
Syntactic parsing is the task of analyzing the structure of a sentence based on some predefined formal assumption. It is a key component in many natural language processing (NLP) pipelines and is of great benefit for natural language understanding (NLU) tasks such as information retrieval or sentiment analysis. Despite achieving very high results with neural network techniques, most syntactic parsing research pays attention to only a few prominent languages (such as English or Chinese) or language-agnostic settings. Thus, we still lack studies that focus on just one language and design specific parsing strategies for that language with regards to its linguistic properties.
In this thesis, we take German as the language of interest and develop more accurate methods for German dependency parsing by combining state-of-the-art neural network methods with techniques that address the specific challenges posed by the language-specific properties of German. Compared to English, German has richer morphology, semi-free word order, and case syncretism. It is the combination of those characteristics that makes parsing German an interesting and challenging task.
Because syntactic parsing is a task that requires many levels of language understanding, we propose to study and improve the knowledge of parsing models at each level in order to improve syntactic parsing for German. These levels are: (sub)word level, syntactic level, semantic level, and sentence level.
At the (sub)word level, we look into a surge in out-of-vocabulary words in German data caused by compounding. We propose a new type of embeddings for compounds that is a compositional model of the embeddings of individual components. Our experiments show that character-based embeddings are superior to word and compound embeddings in dependency parsing, and compound embeddings only outperform word embeddings when the part-of-speech (POS) information is unavailable. Thus, we conclude that it is the morpho-syntactic information of unknown compounds, not the semantic one, that is crucial for parsing German.
At the syntax level, we investigate challenges for local grammatical function labeler that are caused by case syncretism. In detail, we augment the grammatical function labeling component in a neural dependency parser that labels each head-dependent pair independently with a new labeler that includes a decision history, using Long Short-Term Memory networks (LSTMs). All our proposed models significantly outperformed the baseline on three languages: English, German and Czech. However, the impact of the new models is not the same for all languages: the improvement for English is smaller than for the non-configurational languages (German and Czech). Our analysis suggests that the success of the history-based models is not due to better handling of long dependencies but that they are better in dealing with the uncertainty in head direction.
We study the interaction of syntactic parsing with the semantic level via the problem of PP attachment disambiguation. Our motivation is to provide a realistic evaluation of the task where gold information is not available and compare the results of disambiguation systems against the output of a strong neural parser. To our best knowledge, this is the first time that PP attachment disambiguation is evaluated and compared against neural dependency parsing on predicted information. In addition, we present a novel approach for PP attachment disambiguation that uses biaffine attention and utilizes pre-trained contextualized word embeddings as semantic knowledge. Our end-to-end system outperformed the previous pipeline approach on German by a large margin simply by avoiding error propagation caused by predicted information. In the end, we show that parsing systems (with the same semantic knowledge) are in general superior to systems specialized for PP attachment disambiguation.
Lastly, we improve dependency parsing at the sentence level using reranking techniques. So far, previous work on neural reranking has been evaluated on English and Chinese only, both languages with a configurational word order and poor morphology. We re-assess the potential of successful neural reranking models from the literature on English and on two morphologically rich(er) languages, German and Czech. In addition, we introduce a new variation of a discriminative reranker based on graph convolutional networks (GCNs). Our proposed reranker not only outperforms previous models on English but is the only model that is able to improve results over the baselines on German and Czech. Our analysis points out that the failure is due to the lower quality of the k-best lists, where the gold tree ratio and the diversity of the list play an important role
Efficient Beam Tree Recursion
Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a
simple extension of Gumbel Tree RvNN and it was shown to achieve
state-of-the-art length generalization performance in ListOps while maintaining
comparable performance on other tasks. However, although not the worst in its
kind, BT-RvNN can be still exorbitantly expensive in memory usage. In this
paper, we identify the main bottleneck in BT-RvNN's memory usage to be the
entanglement of the scorer function and the recursive cell function. We propose
strategies to remove this bottleneck and further simplify its memory usage.
Overall, our strategies not only reduce the memory usage of BT-RvNN by
- times but also create a new state-of-the-art in ListOps while
maintaining similar performance in other tasks. In addition, we also propose a
strategy to utilize the induced latent-tree node representations produced by
BT-RvNN to turn BT-RvNN from a sentence encoder of the form into a sequence contextualizer of the
form . Thus, our
proposals not only open up a path for further scalability of RvNNs but also
standardize a way to use BT-RvNNs as another building block in the deep
learning toolkit that can be easily stacked or interfaced with other popular
models such as Transformers and Structured State Space models
Gradient-based Inference for Networks with Output Constraints
Practitioners apply neural networks to increasingly complex problems in
natural language processing, such as syntactic parsing and semantic role
labeling that have rich output structures. Many such structured-prediction
problems require deterministic constraints on the output values; for example,
in sequence-to-sequence syntactic parsing, we require that the sequential
outputs encode valid trees. While hidden units might capture such properties,
the network is not always able to learn such constraints from the training data
alone, and practitioners must then resort to post-processing. In this paper, we
present an inference method for neural networks that enforces deterministic
constraints on outputs without performing rule-based post-processing or
expensive discrete search. Instead, in the spirit of gradient-based training,
we enforce constraints with gradient-based inference (GBI): for each input at
test-time, we nudge continuous model weights until the network's unconstrained
inference procedure generates an output that satisfies the constraints. We
study the efficacy of GBI on three tasks with hard constraints: semantic role
labeling, syntactic parsing, and sequence transduction. In each case, the
algorithm not only satisfies constraints but improves accuracy, even when the
underlying network is state-of-the-art.Comment: AAAI 201
- …