36,694 research outputs found
A Pattern Matching method for finding Noun and Proper Noun Translations from Noisy Parallel Corpora
We present a pattern matching method for compiling a bilingual lexicon of
nouns and proper nouns from unaligned, noisy parallel texts of
Asian/Indo-European language pairs. Tagging information of one language is
used. Word frequency and position information for high and low frequency words
are represented in two different vector forms for pattern matching. New anchor
point finding and noise elimination techniques are introduced. We obtained a
73.1\% precision. We also show how the results can be used in the compilation
of domain-specific noun phrases.Comment: 8 pages, uuencoded compressed postscript file. To appear in the
Proceedings of the 33rd AC
MT and Proper Nouns : how a German Model Became a Boat Operator
Writers and translators have difficulties treating proper nouns correctly. These designations represent concepts that are very likely not common knowledge. While humans can research, machines can only apply data provided. It is therefore important that proper nouns are documented in term bases and made available to MT engines.Tant redactors com traductors tenen dificultats per realitzar el tractament correcte dels noms propis. Aquestes denominacions representen conceptes que probablement no pertanyen al coneixement comú. Mentre que els humans poden recercar el concepte, les maquines només poden aplicar les dades de què disposen. Per aquest motiu, és important que els noms propis estiguin documentats a la base de dades terminològiques i que estiguin a disposició dels motors de traducció automà tica.Tanto redactores como traductores tienen dificultades para realizar el tratamiento correcto de los nombres propios. Estas denominaciones representan conceptos que probablemente no pertenezcan al conocimiento común. Mientras que los humanos pueden investigar el concepto, las máquinas únicamente pueden aplicar los datos de los que disponen. Por este motivo, es importante que los nombres propios estén documentados en una base de datos terminológicos y que estén a disposición de los motores de traducción automática
Properhood
A history of the notion of PROPERHOOD in philosophy and linguistics is given. Two long-standing ideas, (i) that proper names have no sense, and (ii) that they are expressions whose purpose is to refer to individuals, cannot be made to work comprehensively while PROPER is understood as a subcategory of linguistic units, whether of lexemes or phrases. Phrases of the type the old vicarage, which are potentially ambiguous with regard to properhood, encourage the suggestion that PROPER is best understood as mode of reference contrasting with SEMANTIC reference; in the former, the intension/sense of any lexical items within the referring expression, and any entailments they give rise to, are canceled. PROPER NAMES are all those expressions that refer nonintensionally. Linguistic evidence is given that this opposition can be grammaticalized, speculation is made about its neurological basis, and psycholinguistic evidence is adduced in support. The PROPER NOUN,asa lexical category, is argued to be epiphenomenal on proper names as newly defined. Some consequences of the view that proper names have no sense in the act of reference are explored; they are not debarred from having senses (better: synchronic etymologies) accessible during other (meta)linguistic activities
Recognition and translation Arabic-French of Named Entities: case of the Sport places
The recognition of Arabic Named Entities (NE) is a problem in different
domains of Natural Language Processing (NLP) like automatic translation.
Indeed, NE translation allows the access to multilingual in-formation. This
translation doesn't always lead to expected result especially when NE contains
a person name. For this reason and in order to ameliorate translation, we can
transliterate some part of NE. In this context, we propose a method that
integrates translation and transliteration together. We used the linguis-tic
NooJ platform that is based on local grammars and transducers. In this paper,
we focus on sport domain. We will firstly suggest a refinement of the
typological model presented at the MUC Conferences we will describe the
integration of an Arabic transliteration module into translation system.
Finally, we will detail our method and give the results of the evaluation
On the Similarities Between Native, Non-native and Translated Texts
We present a computational analysis of three language varieties: native,
advanced non-native, and translation. Our goal is to investigate the
similarities and differences between non-native language productions and
translations, contrasting both with native language. Using a collection of
computational methods we establish three main results: (1) the three types of
texts are easily distinguishable; (2) non-native language and translations are
closer to each other than each of them is to native language; and (3) some of
these characteristics depend on the source or native language, while others do
not, reflecting, perhaps, unified principles that similarly affect translations
and non-native language.Comment: ACL2016, 12 page
- …