66 research outputs found
Word-to-Word Models of Translational Equivalence
Parallel texts (bitexts) have properties that distinguish them from other
kinds of parallel data. First, most words translate to only one other word.
Second, bitext correspondence is noisy. This article presents methods for
biasing statistical translation models to reflect these properties. Analysis of
the expected behavior of these biases in the presence of sparse data predicts
that they will result in more accurate models. The prediction is confirmed by
evaluation with respect to a gold standard -- translation models that are
biased in this fashion are significantly more accurate than a baseline
knowledge-poor model. This article also shows how a statistical translation
model can take advantage of various kinds of pre-existing knowledge that might
be available about particular language pairs. Even the simplest kinds of
language-specific knowledge, such as the distinction between content words and
function words, is shown to reliably boost translation model performance on
some tasks. Statistical models that are informed by pre-existing knowledge
about the model domain combine the best of both the rationalist and empiricist
traditions
Quantifying Cross-lingual Semantic Similarity for Natural Language Processing Applications
Translation and cross-lingual access to information are key technologies in a global economy. Even though the quality of machine translation (MT) output is still far from the level of human translations, many real-world applications have emerged, for which MT can be employed. Machine translation supports human translators in computer-assisted translation (CAT), providing the opportunity to improve translation systems based on human interaction and feedback. Besides, many tasks that involve natural language processing operate in a cross-lingual setting, where there is no need for perfectly fluent translations and the transfer of meaning can be modeled by employing MT technology. This thesis describes cumulative work in the field of cross-lingual natural language processing in a user-oriented setting. A common denominator of the presented approaches is their anchoring in an alignment between texts in two different languages to quantify the similarity of their content
Toponym Disambiguation in Information Retrieval
In recent years, geography has acquired a great importance in the context of
Information Retrieval (IR) and, in general, of the automated processing of
information in text. Mobile devices that are able to surf the web and at the
same time inform about their position are now a common reality, together
with applications that can exploit this data to provide users with locally
customised information, such as directions or advertisements. Therefore,
it is important to deal properly with the geographic information that is
included in electronic texts. The majority of such kind of information is
contained as place names, or toponyms.
Toponym ambiguity represents an important issue in Geographical Information
Retrieval (GIR), due to the fact that queries are geographically constrained.
There has been a struggle to nd speci c geographical IR methods
that actually outperform traditional IR techniques. Toponym ambiguity
may constitute a relevant factor in the inability of current GIR systems to
take advantage from geographical knowledge. Recently, some Ph.D. theses
have dealt with Toponym Disambiguation (TD) from di erent perspectives,
from the development of resources for the evaluation of Toponym Disambiguation
(Leidner (2007)) to the use of TD to improve geographical scope
resolution (Andogah (2010)). The Ph.D. thesis presented here introduces
a TD method based on WordNet and carries out a detailed study of the
relationship of Toponym Disambiguation to some IR applications, such as
GIR, Question Answering (QA) and Web retrieval.
The work presented in this thesis starts with an introduction to the applications
in which TD may result useful, together with an analysis of the
ambiguity of toponyms in news collections. It could not be possible to
study the ambiguity of toponyms without studying the resources that are
used as placename repositories; these resources are the equivalent to language
dictionaries, which provide the di erent meanings of a given word.Buscaldi, D. (2010). Toponym Disambiguation in Information Retrieval [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8912Palanci
- …