Skip to main content
Article thumbnail
Location of Repository

Extracting Parallel Fragments from Comparable Corpora for Data-to-text Generation

By Anja Belz and Eric Kow

Abstract

Building NLG systems, in particular statistical ones, requires parallel data (paired inputs and outputs) which do not generally occur naturally. In this paper, we investigate the idea of automatically extracting parallel resources for data-to-text generation from comparable corpora obtained from the Web. We describe our comparable corpus of data and texts relating to British hills and the techniques for extracting paired input/output fragments we have developed so far

Topics: work on Nitrogen and its successor
Year: 2010
OAI identifier: oai:CiteSeerX.psu:10.1.1.180.3640
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.nltg.brighton.ac.uk... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.