4,545 research outputs found
A Sub-Character Architecture for Korean Language Processing
We introduce a novel sub-character architecture that exploits a unique
compositional structure of the Korean language. Our method decomposes each
character into a small set of primitive phonetic units called jamo letters from
which character- and word-level representations are induced. The jamo letters
divulge syntactic and semantic information that is difficult to access with
conventional character-level units. They greatly alleviate the data sparsity
problem, reducing the observation space to 1.6% of the original while
increasing accuracy in our experiments. We apply our architecture to dependency
parsing and achieve dramatic improvement over strong lexical baselines.Comment: EMNLP 201
Universal Dependencies Parsing for Colloquial Singaporean English
Singlish can be interesting to the ACL community both linguistically as a
major creole based on English, and computationally for information extraction
and sentiment analysis of regional social media. We investigate dependency
parsing of Singlish by constructing a dependency treebank under the Universal
Dependencies scheme, and then training a neural network model by integrating
English syntactic knowledge into a state-of-the-art parser trained on the
Singlish treebank. Results show that English knowledge can lead to 25% relative
error reduction, resulting in a parser of 84.47% accuracies. To the best of our
knowledge, we are the first to use neural stacking to improve cross-lingual
dependency parsing on low-resource languages. We make both our annotation and
parser available for further research.Comment: Accepted by ACL 201
The CoNLL 2007 shared task on dependency parsing
The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results
- …