402 research outputs found
Language technologies for a multilingual Europe
This volume of the series “Translation and Multilingual Natural Language Processing” includes most of the papers presented at the Workshop “Language Technology for a Multilingual Europe”, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic “Multilingual Resources and Multilingual Applications”, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop “Language Technology for a Multilingual Europe” was co-organised by the two GSCL working groups “Text Technology” and “Machine Translation” (http://gscl.info) as well as by META-NET (http://www.meta-net.eu)
Language technologies for a multilingual Europe
This volume of the series “Translation and Multilingual Natural Language Processing” includes most of the papers presented at the Workshop “Language Technology for a Multilingual Europe”, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic “Multilingual Resources and Multilingual Applications”, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop “Language Technology for a Multilingual Europe” was co-organised by the two GSCL working groups “Text Technology” and “Machine Translation” (http://gscl.info) as well as by META-NET (http://www.meta-net.eu)
Angļu-latviešu leksikogrāfiskās tradīcijas kritiska analīze
Angļu-latviešu leksikogrāfiskā tradīcija aizsākas 1924. gadā, kad tiek publicēta pirmā
angļu-latviešu vārdnīca, tradīcijas gaitā ir sastādītas apmēram divdesmit astoņas
dažāda apjoma un strukturālas sarežģītības vārdnīcas. Šobrīd angļu-latviešu
leksikogrāfijā valda stabila un labi iesakņojusies tradīcija, kas nosaka vārdnīcu mega-,
makro- un mikrostrukturālo iezīmju kopumu. Tomēr neskatoties uz to, ka
leksikogrāfiskā materiāla apjoms ir ievērojams, vārdnīcas bieži tiek sastādītas,
izmantojot novecojušas metodes un leksikogrāfiskos avotus.
Pētījuma mērķis ir izanalizēt angļu-latviešu leksikogrāfisko tradīciju tās attīstības
posmos, ņemot vērā dažādos ārējos faktorus, kas ietekmējuši tās attīstības gaitu, kā arī
izcelt angļu-latviešu vārdnīcām raksturīgās mega-, makro- un mikrostrukturālās
iezīmes, kas vērojamas tradīcijas attīstības gaitā, apzināt angļu-latviešu leksikogrāfijas
problēmjomas un piedāvāt teorētiski pamatotus, pasaules leksikogrāfiskajā praksē
pielietotus risinājumus angļu-latviešu vārdnīcu kvalitātes uzlabošanai.The English-Latvian lexicographic tradition starts in 1924 with the publication of the
first English-Latvian dictionary, nearly twenty eight dictionaries of various sizes and
structural complexity have been compiled in the course of the tradition. At the present
moment English-Latvian lexicography is ruled by a stable and well-established
tradition, determining the features of the dictionaries’ mega-, macro- and
microstructure. However, even though the volume of the lexicographic material is
ample, the dictionaries are often compiled using obsolete methods and outdated
lexicographic evidence.
The aim of the study is to review the stages of the development of English-Latvian
lexicographic tradition considering the various extra-linguistic factors which have
influenced its development, as well as to single out the typical features of English-
Latvian dictionaries traced throughout the tradition at the levels of their mega-,
macro- and microstructure, to pinpoint the problematic aspects of English-Latvian
lexicography and to offer theoretically grounded solutions for improving the quality
of English-Latvian dictionaries
Cross-cultural lacunarity and translation techniques: a corpus-based study of English, Russian and Spanish
Lexicalisation patterns varying across languages reveal lexical gaps or lacunae emerging due to structural misalignments between linguistic systems. Lacunae, manifesting themselves as the absence of one-to-one equivalents in one of the contrasting languages, represent a serious translation challenge since they often conceal conceptual discrepancies. Translation of lexemes with no direct equivalents nearly always results in the loss of a certain amount of culture-specific information. This research seeks to provide insight into how speakers’ mental representations diverge in three typologically diverse languages – English, Russian and Spanish – and to investigate ways of overcoming such divergences in translation in a corpus-based study.
This research identifies English lexemes which have no equivalents in Russian and Spanish primarily with the help of the Oxford English Dictionary advanced search tools. Using the Historical Thesaurus of English, their semantic neighbourhood is then investigated to explore the mechanisms of formation and evolution of lacunae. The findings from lexicographic data are further corroborated by corpus evidence. Film subtitles, containing lacunar items, and their translations into Russian and Spanish, are retrieved from online contextual dictionaries and used as parallel corpora to identify how lacunae are handled in actual translation practice.
This study combines three interrelated research strands. The theoretical strand presents a data-driven model offering a nuanced interpretation of a lexical lacuna. The lexicographic strand overviews the lifecycle of lexical lacunae, outlining the mechanisms of their formation and pathways along which they become filled. Finally, the corpus strand discusses 26 identified techniques for tackling lacunae. These are systematically classified into three main translation strategies: formal, semantic and explicative transformations. The corpus-based strand also offers a breakdown of translation solutions appropriate for each type of lacuna. The presented evidence demonstrates that although translation of lacunar items typically entails deviation of varying degrees from the source text, lexical gaps can and should be bridged in translation to prevent them from turning into cultural gaps
SIMuLLDA : a Multilingual Lexical Database Application using a Structured Interlingua
It is commonly accepted that there are about five to six thousand languages.
For many pairs of languages , there is no dictionary X->Y or Y->X,
there are only dictionaries for the pairs X->English/French/Spanish, and
English/French/Spanish->Y. There is a clear need for dictionaries
translating between languages without the intervention of a small number of
Western European languages with a colonial past. Also from a theoretical
point of view, such a need can be defended.
The creation of a dictionary of good quality takes a lot of time, and given
the fact that 5000-6000 languages yield 25-30 million pairs of languages, it
is important to have a database that provides the possibility to translate
directly between pairs of languages. This thesis highlights some problems
that play a role in the creation of such a database, attempts to solve some
of them, and tries to show that some other problems cannot be solved.
A well-known problem is that words are often hard to match across languages:
different words from different languages do not have the same range of
meanings, not all words from one languages have an equivalent in the other,
etc. In this thesis, a sketch is given of a database in which most of these
problems are solved. Crucial in this set-up is the structure of the
interlingua, which provides the possibility to relate non-corresponding
meanings in a structural way. The structure of the interlingua is provided
by a logical framework called Formal Concept Analysis. With the set-up
proposed in this thesis it is possible to generate a descriptive translation
for words in the source language that lack a direct translation in the
target language. This should ease the work of a lexicographer making a
dictionary for a new pair of languages
TC3 III
This volume of the series “Translation and Multilingual Natural Language Processing” includes most of the papers presented at the Workshop “Language Technology for a Multilingual Europe”, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic “Multilingual Resources and Multilingual Applications”, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop “Language Technology for a Multilingual Europe” was co-organised by the two GSCL working groups “Text Technology” and “Machine Translation” (http://gscl.info) as well as by META-NET (http://www.meta-net.eu)
First International Workshop on Lexical Resources
International audienceLexical resources are one of the main sources of linguistic information for research and applications in Natural Language Processing and related fields. In recent years advances have been achieved in both symbolic aspects of lexical resource development (lexical formalisms, rule-based tools) and statistical techniques for the acquisition and enrichment of lexical resources, both monolingual and multilingual. The latter have allowed for faster development of large-scale morphological, syntactic and/or semantic resources, for widely-used as well as resource-scarce languages. Moreover, the notion of dynamic lexicon is used increasingly for taking into account the fact that the lexicon undergoes a permanent evolution.This workshop aims at sketching a large picture of the state of the art in the domain of lexical resource modeling and development. It is also dedicated to research on the application of lexical resources for improving corpus-based studies and language processing tools, both in NLP and in other language-related fields, such as linguistics, translation studies, and didactics
Cross-Lingual Link Discovery for Under-Resourced Languages
CC BY-NC 4.0In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges,
experiences and prospects of their application to under-resourced languages. We first introduce the goals of cross-lingual
linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied
to language data can play in this context. We define under-resourced languages with a specific focus on languages actively
used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language
technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are
available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream
applications for under-resourced languages via the localisation and adaptation of existing technologies and resources
- …