Search CORE

2,172 research outputs found

Dependency parsing of Turkish

Author: Eryigit Gulsen
Eryiğit Gülşen
Nivre Joakim
Oflazer Kemal
Publication venue: 'MIT Press - Journals'
Publication date: 01/09/2006
Field of study

The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, poses interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical representations called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We compare two different parsing methods, one based on a probabilistic model with beam search, the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of parsing method.We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank

CiteSeerX

Crossref

Sabanci University Research Database

Character-Aware Neural Language Models

Author: Jernite Yacine
Kim Yoon
Rush Alexander M.
Sontag David
Publication venue
Publication date: 01/12/2015
Field of study

We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway network over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Syntactic annotation of non-canonical linguistic structures

Author: Doolittle Seanna
Hirschmann Hagen
Lüdeling Anke
Publication venue
Publication date: 27/10/2009
Field of study

This paper deals with the syntactic annotation of corpora that contain both ‘canonical’ and ‘non-canonical’ sentences

Hochschulschriftenserver - Universität Frankfurt am Main

Adversarial Generation of Natural Language

Author: Courville Aaron
Dutil Francis
Pal Christopher
Rajeswar Sai
Subramanian Sandeep
Publication venue
Publication date: 01/01/2017
Field of study

Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for image generation. Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods. In this paper, we take a step towards generating natural language with a GAN objective alone. We introduce a simple baseline that addresses the discrete output space problem without relying on gradient estimators and show that it is able to achieve state-of-the-art results on a Chinese poem generation dataset. We present quantitative results on generating sentences from context-free and probabilistic context-free grammars, and qualitative language modeling results. A conditional version is also described that can generate sequences conditioned on sentence characteristics.Comment: 11 pages, 3 figures, 5 table

arXiv.org e-Print Archive

Crossref

Experiments with discourse-level choices and readability

Author: Osman Liesl
Reiter Ehud
Williams Sandra
Publication venue
Publication date: 01/01/2003
Field of study

This paper reports on pilot experiments that are being used, together with corpus analysis, in the development of a Natural Language Generation (NLG) system, GIRL (Generator for Individual Reading Levels). GIRL generates reports for individuals after a literacy assessment. We tested GIRL's output on adult learner readers and good readers. Our aim was to find out if choices the system makes at the discourse-level have an impact on readability. Our preliminary results indicate that such choices do indeed appear to be important for learner readers. These will be investigated further in future larger-scale experiments. Ultimately we intend to use the results to develop a mechanism that makes discourse-level choices that are appropriate for individuals' reading skills

CiteSeerX

Open Research Online