51 research outputs found

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Volume 2 of The Nebraska Educator: Full Issue

    Get PDF
    « Ce n’est que ridicule d’être sourd, c’est triste d’être aveugle. On peut ainsi mesurer la différence qu’il y a entre la nature visible et les hommes qui parlent. »Jules Renard, Journal, 1898. L’existence d’êtres humains apparemment dépourvus de langage a, de tout temps, suscité la curiosité, la fascination voire l’inquiétude. C’est qu’ils représentent dans l’imaginaire collectif, l’homme dans sa plus simple expression, livré à lui-même, dans le dénuement le plus total, privé de toute relati..

    Chinese elements : a bridge of the integration between Chinese -English translation and linguaculture transnational mobility

    Get PDF
    [Abstract] As the popularity of Chinese elements in the innovation of the translation part in Chinese CET, we realized that Chinese elements have become a bridge between linguaculture transnational mobility and Chinese-English translation.So, Chinese students translation skills should be critically improved; for example, on their understanding about Chinese culture, especially the meaning of Chinese culture. Five important secrets of skillful translation are introduced to improve students’ translation skills

    Corpus-Based Research on Chinese Language and Linguistics

    Get PDF
    This volume collects papers presenting corpus-based research on Chinese language and linguistics, from both a synchronic and a diachronic perspective. The contributions cover different fields of linguistics, including syntax and pragmatics, semantics, morphology and the lexicon, sociolinguistics, and corpus building. There is now considerable emphasis on the reliability of linguistic data: the studies presented here are all grounded in the tenet that corpora, intended as collections of naturally occurring texts produced by a variety of speakers/writers, provide a more robust, statistically significant foundation for linguistic analysis. The volume explores not only the potential of using corpora as tools allowing access to authentic language material, but also the challenges involved in corpus interrogation, analysis, and building

    Tune your brown clustering, please

    Get PDF
    Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal

    An investigation into the translation of Isixhosa kinship lexical items into English

    Get PDF
    This study investigates the problem of non- equivalence in the translation of IsiXhosa kinship lexical items or concepts into English. Venuti (2012:5) says translation can be seen as ‘a set of changing relationships between the relative autonomy of the translated text…and…equivalence and function.’ Equivalence, of which non- equivalence is the antithesis, includes “accuracy,’ adequacy,’ ‘correctness,’ ‘correspondence,’ or ‘identity’. A variable notion, it indicates how translation is connected to the source text. A break in that connection results in non-equivalence. Therein lies the problem. The current study examines these notions, among other things, as they apply to the translation of isiXhosa kinship lexical items into English. As Venuti (2010) points out translation rests on particular assumptions about language use. These assumptions draw on two particular, emerging theories or approaches namely, the ‘instrumental and the hermeneutic’ as will be discussed. Isicatshulwa Olu phando lumalunga nengxaki yokungafani kwentsingiselo yamagama okanye ingqikelelo xa kuguqulelwa esiNgesini amagama okuzalana esiXhoseni. UVenuti (2012:5) uthi inguqulo ingajongwa njengokujika konxulumano oluguqukayo phakathi kokungaxhomekeki kwesiqendwana esiguqulwayo nonxulumano. Unxulumano oluphikisana nokunganxulumani, lubandakanya ‘ukuchaneka,’ ‘ukufanela,’ ‘ukulunga,’ ukungqinelana,’ okanye ‘ukufana twatse.’ Le ngcingane iguquguqukayo ibonisa indlela inguqulo ihambelana ngayo nesiqendwana esiguqulwayo. Xa oko kuhambelana kuthe kwangabikho, loo nto izala ukunganxulumani. Ilapho ke ingxaki. Olu phando luphonononga ezi ngcingane, phakathi kwezinye zezinto, njengoko zisebenza kwinguqulelo esiNgesini kwamagama okuzalana esiXhoseni. Njengoko uVenuti (2010) abonisayo, inguqulo ingqiyame ngeengcinga ezithile ezimbini malunga nokusetyenziswa kolwimi. Ezi ngcinga ziphenjelelwa ziinkcazo eziziingcingane ezithile zamva nje. Zibizwa ‘i-instrumental,’ ‘ne- hermeneutic,’ njengoko uphando olu luza kuxoxa ngazo

    An investigation into the translation of Isixhosa kinship lexical items into English

    Get PDF
    This study investigates the problem of non- equivalence in the translation of IsiXhosa kinship lexical items or concepts into English. Venuti (2012:5) says translation can be seen as ‘a set of changing relationships between the relative autonomy of the translated text…and…equivalence and function.’ Equivalence, of which non- equivalence is the antithesis, includes “accuracy,’ adequacy,’ ‘correctness,’ ‘correspondence,’ or ‘identity’. A variable notion, it indicates how translation is connected to the source text. A break in that connection results in non-equivalence. Therein lies the problem. The current study examines these notions, among other things, as they apply to the translation of isiXhosa kinship lexical items into English. As Venuti (2010) points out translation rests on particular assumptions about language use. These assumptions draw on two particular, emerging theories or approaches namely, the ‘instrumental and the hermeneutic’ as will be discussed. Isicatshulwa Olu phando lumalunga nengxaki yokungafani kwentsingiselo yamagama okanye ingqikelelo xa kuguqulelwa esiNgesini amagama okuzalana esiXhoseni. UVenuti (2012:5) uthi inguqulo ingajongwa njengokujika konxulumano oluguqukayo phakathi kokungaxhomekeki kwesiqendwana esiguqulwayo nonxulumano. Unxulumano oluphikisana nokunganxulumani, lubandakanya ‘ukuchaneka,’ ‘ukufanela,’ ‘ukulunga,’ ukungqinelana,’ okanye ‘ukufana twatse.’ Le ngcingane iguquguqukayo ibonisa indlela inguqulo ihambelana ngayo nesiqendwana esiguqulwayo. Xa oko kuhambelana kuthe kwangabikho, loo nto izala ukunganxulumani. Ilapho ke ingxaki. Olu phando luphonononga ezi ngcingane, phakathi kwezinye zezinto, njengoko zisebenza kwinguqulelo esiNgesini kwamagama okuzalana esiXhoseni. Njengoko uVenuti (2010) abonisayo, inguqulo ingqiyame ngeengcinga ezithile ezimbini malunga nokusetyenziswa kolwimi. Ezi ngcinga ziphenjelelwa ziinkcazo eziziingcingane ezithile zamva nje. Zibizwa ‘i-instrumental,’ ‘ne- hermeneutic,’ njengoko uphando olu luza kuxoxa ngazo

    Towards the automatic analsis of sentiments in Basque: the creation of basic resources and the identification of valence shifters in different language levels

    Get PDF
    243 p. (eusk) 139 p. (eng.)Tesi-lan honetan, hizkuntzalaritza aplikatuaren ikuspegitik, euskarazko sentimendu analisian lehenurratsak egin dira. Bi helburu nagusi egon dira tesi-proiektuan. Alde batetik, sentimendu analisia egitekooinarrizko baliabideak sortu ditugu euskararentzat. Zehatz esanda, Euskarazko Iritzi Corpusa, Sentitegiizeneko euskarazko sentimendu lexikoia eta dokumentu-mailako sentimendu sailkatzailea garatu ditugu.Corpusak sei domeinuetako 240 iritzi-testu biltzen ditu. RST hurbilpenaz baliatuta, corpusekodiskurtso-informazioa etiketatuta dago. Gainera, iritzi-testuen orientazio semantikoa ere etiketatuta dago.Sentimendu lexikoiari dagokionez, 1.237 hitzez osatuta dago eta bertako sarrerek -5 eta +5 artekosentimendu balentzia dute. Sentimendu lexikoia sortzeko itzulpen metodologia zehatz bat jarraitu dugu.Azkenik, dokumentu mailako sentimendu sailkatzailea ere garatu dugu. Tresnaren oinarrian aurretikaipatu dugu sentimendu lexikoia dago eta, horretaz gain, baditu beste zenbait erregela ere.Beste aldetik, sentimendu analisiaren lanketa teoriko bat ere egin dugu. Sentimendu sailkapena lexikoianoinarrituz egin nahi bada, hitzen sentimendu balentzia jakitearekin ez da nahikoa, izan ere, testuetanbadaude zenbait fenomeno hitz horien sentimendu balentzia eragiten dutenak. Horiei testuinguruzkobalentzia aldatzaileak deitzen zaie eta horiek euskaran nola agertzen diren landu dugu. Gramatika mailabakoitzeko balentzia aldatzaile mota bat landu dugu: fonologian, bustidura adierazkorra; morfologian,morfemak; sintaxian, ezeztapen-markak eta, azkenik, diskurtsoan, diskurtso erlazioak eta unitate zentrala.Emaitzek erakusten dutenez, balentzia aldatzaileek hitzen edo sintagmen sentimendu balentzia indartuedo ahuldu egiten dute. Ahultze horren intentsitatearen arabera, sentimendu balentziaren zeinuan aldaketagerta liteke, positiboa dena negatibo bilakatuz edo alderantziz. Azkenik, kasu batzuetan, balentziaaldatzaileak ez du eraginik sortzen
    • …