4,497 research outputs found
Efficient Multi-Template Learning for Structured Prediction
Conditional random field (CRF) and Structural Support Vector Machine
(Structural SVM) are two state-of-the-art methods for structured prediction
which captures the interdependencies among output variables. The success of
these methods is attributed to the fact that their discriminative models are
able to account for overlapping features on the whole input observations. These
features are usually generated by applying a given set of templates on labeled
data, but improper templates may lead to degraded performance. To alleviate
this issue, in this paper, we propose a novel multiple template learning
paradigm to learn structured prediction and the importance of each template
simultaneously, so that hundreds of arbitrary templates could be added into the
learning model without caution. This paradigm can be formulated as a special
multiple kernel learning problem with exponential number of constraints. Then
we introduce an efficient cutting plane algorithm to solve this problem in the
primal, and its convergence is presented. We also evaluate the proposed
learning paradigm on two widely-studied structured prediction tasks,
\emph{i.e.} sequence labeling and dependency parsing. Extensive experimental
results show that the proposed method outperforms CRFs and Structural SVMs due
to exploiting the importance of each template. Our complexity analysis and
empirical results also show that our proposed method is more efficient than
OnlineMKL on very sparse and high-dimensional data. We further extend this
paradigm for structured prediction using generalized -block norm
regularization with , and experiments show competitive performances when
A Re-ranking Model for Dependency Parser with Recursive Convolutional Neural Network
In this work, we address the problem to model all the nodes (words or
phrases) in a dependency tree with the dense representations. We propose a
recursive convolutional neural network (RCNN) architecture to capture syntactic
and compositional-semantic representations of phrases and words in a dependency
tree. Different with the original recursive neural network, we introduce the
convolution and pooling layers, which can model a variety of compositions by
the feature maps and choose the most informative compositions by the pooling
layers. Based on RCNN, we use a discriminative model to re-rank a -best list
of candidate dependency parsing trees. The experiments show that RCNN is very
effective to improve the state-of-the-art dependency parsing on both English
and Chinese datasets
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
Transfer learning has fundamentally changed the landscape of natural language
processing (NLP) research. Many existing state-of-the-art models are first
pre-trained on a large text corpus and then fine-tuned on downstream tasks.
However, due to limited data resources from downstream tasks and the extremely
large capacity of pre-trained models, aggressive fine-tuning often causes the
adapted model to overfit the data of downstream tasks and forget the knowledge
of the pre-trained model. To address the above issue in a more principled
manner, we propose a new computational framework for robust and efficient
fine-tuning for pre-trained language models. Specifically, our proposed
framework contains two important ingredients: 1. Smoothness-inducing
regularization, which effectively manages the capacity of the model; 2. Bregman
proximal point optimization, which is a class of trust-region methods and can
prevent knowledge forgetting. Our experiments demonstrate that our proposed
method achieves the state-of-the-art performance on multiple NLP benchmarks.Comment: The 58th annual meeting of the Association for Computational
Linguistics (ACL 2020
- …