2,704 research outputs found

    Proceedings of the COLING 2004 Post Conference Workshop on Multilingual Linguistic Ressources MLR2004

    No full text
    International audienceIn an ever expanding information society, most information systems are now facing the "multilingual challenge". Multilingual language resources play an essential role in modern information systems. Such resources need to provide information on many languages in a common framework and should be (re)usable in many applications (for automatic or human use). Many centres have been involved in national and international projects dedicated to building har- monised language resources and creating expertise in the maintenance and further development of standardised linguistic data. These resources include dictionaries, lexicons, thesauri, word-nets, and annotated corpora developed along the lines of best practices and recommendations. However, since the late 90's, most efforts in scaling up these resources remain the responsibility of the local authorities, usually, with very low funding (if any) and few opportunities for academic recognition of this work. Hence, it is not surprising that many of the resource holders and developers have become reluctant to give free access to the latest versions of their resources, and their actual status is therefore currently rather unclear. The goal of this workshop is to study problems involved in the development, management and reuse of lexical resources in a multilingual context. Moreover, this workshop provides a forum for reviewing the present state of language resources. The workshop is meant to bring to the international community qualitative and quantitative information about the most recent developments in the area of linguistic resources and their use in applications. The impressive number of submissions (38) to this workshop and in other workshops and conferences dedicated to similar topics proves that dealing with multilingual linguistic ressources has become a very hot problem in the Natural Language Processing community. To cope with the number of submissions, the workshop organising committee decided to accept 16 papers from 10 countries based on the reviewers' recommendations. Six of these papers will be presented in a poster session. The papers constitute a representative selection of current trends in research on Multilingual Language Resources, such as multilingual aligned corpora, bilingual and multilingual lexicons, and multilingual speech resources. The papers also represent a characteristic set of approaches to the development of multilingual language resources, such as automatic extraction of information from corpora, combination and re-use of existing resources, online collaborative development of multilingual lexicons, and use of the Web as a multilingual language resource. The development and management of multilingual language resources is a long-term activity in which collaboration among researchers is essential. We hope that this workshop will gather many researchers involved in such developments and will give them the opportunity to discuss, exchange, compare their approaches and strengthen their collaborations in the field. The organisation of this workshop would have been impossible without the hard work of the program committee who managed to provide accurate reviews on time, on a rather tight schedule. We would also like to thank the Coling 2004 organising committee that made this workshop possible. Finally, we hope that this workshop will yield fruitful results for all participants

    TectoMT – a deep-­linguistic core of the combined Chimera MT system

    Get PDF
    Chimera is a machine translation system that combines the TectoMT deep-linguistic core with phrase-based MT system Moses. For English–Czech pair it also uses the Depfix post-correction system. All the components run on Unix/Linux platform and are open source (available from Perl repository CPAN and the LINDAT/CLARIN repository). The main website is https://ufal.mff.cuni.cz/tectomt. The development is currently supported by the QTLeap 7th FP project (http://qtleap.eu)

    Modeling information structure in a cross-linguistic perspective

    Get PDF
    This study makes substantial contributions to both the theoretical and computational treatment of information structure, with a specific focus on creating natural language processing applications such as multilingual machine translation systems. The present study first provides cross-linguistic findings in regards to information structure meanings and markings. Building upon such findings, the current model represents information structure within the HPSG/MRS framework using Individual Constraints. The primary goal of the present study is to create a multilingual grammar model of information structure for the LinGO Grammar Matrix system. The present study explores the construction of a grammar library for creating customized grammar incorporating information structure and illustrates how the information structure-based model improves performance of transfer-based machine translation

    Getting Past the Language Gap: Innovations in Machine Translation

    Get PDF
    In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT

    Borrowings and Adaptations in Vietnamese Culture

    Get PDF

    Feature-based Transfer of Multilingual Sentence Representations to Cross-lingual Tasks

    Get PDF
    Universella meningsrepresentationer och flersprĂ„kig sprĂ„kmodellering Ă€r heta Ă€mnen inom sprĂ„kteknologi, specifikt omrĂ„det som berör förstĂ„else för naturligt sprĂ„k (natural language understanding). En meningsinbĂ€ddning (sentence embedding) Ă€r en numerisk skildring av en följd ord som motsvaras av en hel fras eller mening, speficikt som ett resultat av en omkodare (encoder) inom maskininlĂ€rning. Dessa representationer behövs för automatiska uppgifter inom sprĂ„kteknologi som krĂ€ver förstĂ„else för betydelsen av en hel mening, till skillnad frĂ„n kombinationer av enskilda ords betydelser. Till sĂ„dana uppgifter kan rĂ€knas till exempel inferens (huruvida ett par satser Ă€r logiskt anknutna, natural language inference) samt Ă„siktsanalys (sentiment analysis). Med universalitet avses kodad betydelse som Ă€r tillrĂ€ckligt allmĂ€n för att gynna andra relaterade uppgifter, som till exempel klassificering. Det efterfrĂ„gas tydligare samförstĂ„nd kring strategier som anvĂ€nds för att bedöma kvaliteten pĂ„ dessa inbĂ€ddningar, antingen genom att direkt undersöka deras lingvistiska egenskaper eller genom att anvĂ€nda dem som oberoende variabler (features) i relaterade modeller. PĂ„ grund av att det Ă€r kostsamt att skapa resurser av hög kvalitet och upprĂ€tthĂ„lla sofistikerade system pĂ„ alla sprĂ„k som anvĂ€nds i vĂ€rlden finns det Ă€ven ett stort intresse för uppskalering av moderna system till sprĂ„k med knappa resurser. Tanken med detta Ă€r sĂ„ kallad överföring (transfer) av kunskap inte bara mellan olika uppgifter, utan Ă€ven mellan olika sprĂ„k. Trots att behovet av tvĂ€rsprĂ„kiga överföringsmetoder erkĂ€nns i forskningssamhĂ€llet Ă€r utvĂ€rderingsverktyg och riktmĂ€rken fortfarande i ett tidigt skede. SentEval Ă€r ett existerande verktyg för utvĂ€rdering av meningsinbĂ€ddningar med speciell betoning pĂ„ deras universalitet. Syftet med detta avhandlingsprojekt Ă€r ett försök att utvidga detta verktyg att stödja samtidig bedömning pĂ„ nya uppgifter som omfattar flera olika sprĂ„k. BedömningssĂ€ttet bygger pĂ„ strategin att lĂ„ta kodade meningar fungera som variabler i sĂ„ kallade downstream-uppgifter och observera huruvida resultaten förbĂ€ttras. En modern mĂ„ngsprĂ„kig modell baserad pĂ„ sĂ„ kallad transformers-arkitektur utvĂ€rderas pĂ„ en etablerad inferensuppgift sĂ„vĂ€l som en ny kĂ€nsloanalyssuppgift (emotion detection), av vilka bĂ„da omfattar data pĂ„ en mĂ€ngd olika sprĂ„k. Även om det praktiska genomförandet i stor utstrĂ€ckning förblev experimentellt rapporteras vissa tentativa resultat i denna avhandling

    Martial arts fiction : translational migrations east and west

    Get PDF
    This thesis was motivated by Robert Chard's puzzlement over the translational phenomenon of martial arts fiction in the West. It proposes to address how the translational migration of martial arts fiction took place, first to other Asian countries in the 1920's, but to the West only after a lapse of a few decades beginning in the early 1990's. Adopting a descriptive approach as described by Gideon Toury, the thesis is intended to add further to the limited inventory of case studies in urgent demand to test the polysystem theory propounded by Even-Zohar. The thesis is made up of two parts. Part I is a macro-level study of martial arts fiction, intended to contribute to testing the limits of the polysystem theory. After examining Chinese fiction as a low form in the Chinese literary polysystem and its weak function as translated literature in the Western literary polysystem, the study explores the translational phenomenon of martial arts fiction in the West as well as the concurrent phenomenon as to why so little of martial arts fiction has been translated into Western languages, compared to the copious amount into other Asian languages, to the extent of stimulating a new literary genre or (re)writing martial arts fiction in indigenous languages in Indonesia, Vietnam and Korea, sinicized countries or countries boasting large overseas Chinese communities. Issues and problems related to these translational activities and cultural phenomena are presented as tools to test the limits of the polysystem theory. Part II is a micro-level study focussing on the specifics of rendering Fox Volant of the Snowy Mountain by Jin Yong into English. I will argue, in the main, that many difficulties, inherent in both the translating and reading processes, can be constructed within the theoretical framework of Andre Lefevere's concept of "constraint", particularly that of the universe of discourse. Lefevere's connotation of the universe of discourse will be expanded to embrace different cultural presuppositions and literary assumptions underlying two divergent world cultures, hence different reader expectations in the reading process. It is hoped that the findings and results of this descriptive case history of martial arts fiction as a literary genre in translational migrations will contribute to the accumulation of knowledge
    • 

    corecore