Skip to main content
Article thumbnail
Location of Repository

Real-Time Identification of Parallel Texts from Bilingual Newsfeed

By David Nadeau and George Foster

Abstract

Parallel texts are documents that present parallel translations. This paper describes a simple method that can be deployed on a real-time news feed to create an infinitely growing source of parallel texts in French and English. Our experiment was lead on the Canada Newswire news feed. Given some of its intrinsic properties, it was possible to deploy a relatively simple text matching techniques that rely on language independent cognates such numbers, capitalized words, punctuation and new lines characters. On three week of press releases, our system correctly identified the vast majority of parallel press release. It committed only minor errors on repeated news items

Topics: Language
Year: 2004
OAI identifier: oai:cogprints.org:4397
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://cogprints.org/4397/1/NR... (external link)
  • http://cogprints.org/4397/ (external link)
  • Suggested articles

    Citations

    1. (1999). BITS: A Method for Bilingual Text Search over the Web, in Machine Translation Summit VII,
    2. (2004). US Information Technology Research for National Priorities, The National Science Foundation.

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.