402 research outputs found

    Language technologies for a multilingual Europe

    Get PDF
    This volume of the series “Translation and Multilingual Natural Language Processing” includes most of the papers presented at the Workshop “Language Technology for a Multilingual Europe”, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic “Multilingual Resources and Multilingual Applications”, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop “Language Technology for a Multilingual Europe” was co-organised by the two GSCL working groups “Text Technology” and “Machine Translation” (http://gscl.info) as well as by META-NET (http://www.meta-net.eu)

    Language technologies for a multilingual Europe

    Get PDF
    This volume of the series “Translation and Multilingual Natural Language Processing” includes most of the papers presented at the Workshop “Language Technology for a Multilingual Europe”, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic “Multilingual Resources and Multilingual Applications”, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop “Language Technology for a Multilingual Europe” was co-organised by the two GSCL working groups “Text Technology” and “Machine Translation” (http://gscl.info) as well as by META-NET (http://www.meta-net.eu)

    Angļu-latviešu leksikogrāfiskās tradīcijas kritiska analīze

    Get PDF
    Angļu-latviešu leksikogrāfiskā tradīcija aizsākas 1924. gadā, kad tiek publicēta pirmā angļu-latviešu vārdnīca, tradīcijas gaitā ir sastādītas apmēram divdesmit astoņas dažāda apjoma un strukturālas sarežģītības vārdnīcas. Šobrīd angļu-latviešu leksikogrāfijā valda stabila un labi iesakņojusies tradīcija, kas nosaka vārdnīcu mega-, makro- un mikrostrukturālo iezīmju kopumu. Tomēr neskatoties uz to, ka leksikogrāfiskā materiāla apjoms ir ievērojams, vārdnīcas bieži tiek sastādītas, izmantojot novecojušas metodes un leksikogrāfiskos avotus. Pētījuma mērķis ir izanalizēt angļu-latviešu leksikogrāfisko tradīciju tās attīstības posmos, ņemot vērā dažādos ārējos faktorus, kas ietekmējuši tās attīstības gaitu, kā arī izcelt angļu-latviešu vārdnīcām raksturīgās mega-, makro- un mikrostrukturālās iezīmes, kas vērojamas tradīcijas attīstības gaitā, apzināt angļu-latviešu leksikogrāfijas problēmjomas un piedāvāt teorētiski pamatotus, pasaules leksikogrāfiskajā praksē pielietotus risinājumus angļu-latviešu vārdnīcu kvalitātes uzlabošanai.The English-Latvian lexicographic tradition starts in 1924 with the publication of the first English-Latvian dictionary, nearly twenty eight dictionaries of various sizes and structural complexity have been compiled in the course of the tradition. At the present moment English-Latvian lexicography is ruled by a stable and well-established tradition, determining the features of the dictionaries’ mega-, macro- and microstructure. However, even though the volume of the lexicographic material is ample, the dictionaries are often compiled using obsolete methods and outdated lexicographic evidence. The aim of the study is to review the stages of the development of English-Latvian lexicographic tradition considering the various extra-linguistic factors which have influenced its development, as well as to single out the typical features of English- Latvian dictionaries traced throughout the tradition at the levels of their mega-, macro- and microstructure, to pinpoint the problematic aspects of English-Latvian lexicography and to offer theoretically grounded solutions for improving the quality of English-Latvian dictionaries

    Cross-cultural lacunarity and translation techniques: a corpus-based study of English, Russian and Spanish

    Get PDF
    Lexicalisation patterns varying across languages reveal lexical gaps or lacunae emerging due to structural misalignments between linguistic systems. Lacunae, manifesting themselves as the absence of one-to-one equivalents in one of the contrasting languages, represent a serious translation challenge since they often conceal conceptual discrepancies. Translation of lexemes with no direct equivalents nearly always results in the loss of a certain amount of culture-specific information. This research seeks to provide insight into how speakers’ mental representations diverge in three typologically diverse languages – English, Russian and Spanish – and to investigate ways of overcoming such divergences in translation in a corpus-based study. This research identifies English lexemes which have no equivalents in Russian and Spanish primarily with the help of the Oxford English Dictionary advanced search tools. Using the Historical Thesaurus of English, their semantic neighbourhood is then investigated to explore the mechanisms of formation and evolution of lacunae. The findings from lexicographic data are further corroborated by corpus evidence. Film subtitles, containing lacunar items, and their translations into Russian and Spanish, are retrieved from online contextual dictionaries and used as parallel corpora to identify how lacunae are handled in actual translation practice. This study combines three interrelated research strands. The theoretical strand presents a data-driven model offering a nuanced interpretation of a lexical lacuna. The lexicographic strand overviews the lifecycle of lexical lacunae, outlining the mechanisms of their formation and pathways along which they become filled. Finally, the corpus strand discusses 26 identified techniques for tackling lacunae. These are systematically classified into three main translation strategies: formal, semantic and explicative transformations. The corpus-based strand also offers a breakdown of translation solutions appropriate for each type of lacuna. The presented evidence demonstrates that although translation of lacunar items typically entails deviation of varying degrees from the source text, lexical gaps can and should be bridged in translation to prevent them from turning into cultural gaps

    SIMuLLDA : a Multilingual Lexical Database Application using a Structured Interlingua

    Get PDF
    It is commonly accepted that there are about five to six thousand languages. For many pairs of languages , there is no dictionary X->Y or Y->X, there are only dictionaries for the pairs X->English/French/Spanish, and English/French/Spanish->Y. There is a clear need for dictionaries translating between languages without the intervention of a small number of Western European languages with a colonial past. Also from a theoretical point of view, such a need can be defended. The creation of a dictionary of good quality takes a lot of time, and given the fact that 5000-6000 languages yield 25-30 million pairs of languages, it is important to have a database that provides the possibility to translate directly between pairs of languages. This thesis highlights some problems that play a role in the creation of such a database, attempts to solve some of them, and tries to show that some other problems cannot be solved. A well-known problem is that words are often hard to match across languages: different words from different languages do not have the same range of meanings, not all words from one languages have an equivalent in the other, etc. In this thesis, a sketch is given of a database in which most of these problems are solved. Crucial in this set-up is the structure of the interlingua, which provides the possibility to relate non-corresponding meanings in a structural way. The structure of the interlingua is provided by a logical framework called Formal Concept Analysis. With the set-up proposed in this thesis it is possible to generate a descriptive translation for words in the source language that lack a direct translation in the target language. This should ease the work of a lexicographer making a dictionary for a new pair of languages

    TC3 III

    Get PDF
    This volume of the series “Translation and Multilingual Natural Language Processing” includes most of the papers presented at the Workshop “Language Technology for a Multilingual Europe”, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic “Multilingual Resources and Multilingual Applications”, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop “Language Technology for a Multilingual Europe” was co-organised by the two GSCL working groups “Text Technology” and “Machine Translation” (http://gscl.info) as well as by META-NET (http://www.meta-net.eu)

    First International Workshop on Lexical Resources

    Get PDF
    International audienceLexical resources are one of the main sources of linguistic information for research and applications in Natural Language Processing and related fields. In recent years advances have been achieved in both symbolic aspects of lexical resource development (lexical formalisms, rule-based tools) and statistical techniques for the acquisition and enrichment of lexical resources, both monolingual and multilingual. The latter have allowed for faster development of large-scale morphological, syntactic and/or semantic resources, for widely-used as well as resource-scarce languages. Moreover, the notion of dynamic lexicon is used increasingly for taking into account the fact that the lexicon undergoes a permanent evolution.This workshop aims at sketching a large picture of the state of the art in the domain of lexical resource modeling and development. It is also dedicated to research on the application of lexical resources for improving corpus-based studies and language processing tools, both in NLP and in other language-related fields, such as linguistics, translation studies, and didactics

    Cross-Lingual Link Discovery for Under-Resourced Languages

    Get PDF
    CC BY-NC 4.0In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We first introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We define under-resourced languages with a specific focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources
    corecore