31 research outputs found
A Sub-Character Architecture for Korean Language Processing
We introduce a novel sub-character architecture that exploits a unique
compositional structure of the Korean language. Our method decomposes each
character into a small set of primitive phonetic units called jamo letters from
which character- and word-level representations are induced. The jamo letters
divulge syntactic and semantic information that is difficult to access with
conventional character-level units. They greatly alleviate the data sparsity
problem, reducing the observation space to 1.6% of the original while
increasing accuracy in our experiments. We apply our architecture to dependency
parsing and achieve dramatic improvement over strong lexical baselines.Comment: EMNLP 201
Recommended from our members
Spectral Methods for Natural Language Processing
Many state-of-the-art results in natural language processing (NLP) are achieved with statistical models involving latent variables. Unfortunately, computational problems associated with such models (for instance, finding the optimal parameter values) are typically intractable, forcing practitioners to rely on heuristic methods without strong guarantees. While heuristics are often sufficient for empirical purposes, their de-emphasis on theoretical aspects has certain negative ramifications. First, it can impede the development of rigorous theoretical understanding which can generate new ideas and algorithms. Second, it can lead to black art solutions that are unreliable and difficult to reproduce.
In this thesis, we argue that spectral methods---that is, methods that use singular value decomposition or other similar matrix or tensor factorization---can effectively remedy these negative ramifications. To this end, we develop spectral methods for two unsupervised language processing tasks. The first task is learning lexical representations from unannotated text (e.g., hierarchical clustering of a vocabulary). The second task is estimating parameters of latent-variable models used in NLP applications (e.g., for unsupervised part-of-speech tagging). We show that our spectral algorithms have the following advantages over previous methods:
1. The algorithms provide a new theoretical framework that is amenable to rigorous analysis. In particular, they are shown to be statistically consistent.
2. The algorithms are simple to implement, efficient, and scalable to large amounts of data. They also yield results that are competitive with the state-of-the-art
Seq2seq is All You Need for Coreference Resolution
Existing works on coreference resolution suggest that task-specific models
are necessary to achieve state-of-the-art performance. In this work, we present
compelling evidence that such models are not necessary. We finetune a
pretrained seq2seq transformer to map an input document to a tagged sequence
encoding the coreference annotation. Despite the extreme simplicity, our model
outperforms or closely matches the best coreference systems in the literature
on an array of datasets. We also propose an especially simple seq2seq approach
that generates only tagged spans rather than the spans interleaved with the
original text. Our analysis shows that the model size, the amount of
supervision, and the choice of sequence representations are key factors in
performance.Comment: EMNLP 202