12 research outputs found

    ULiS: An Expert System on Linguistics to Support Multilingual Management of Interlingual Knowledge bases

    Get PDF
    International audienceWe are interested in bridging the world of natural language and the world of the semantic web in particular to support multilingual access to the web of data, and multilingual management of interlingual knowledge bases. In this paper we introduce the ULiS project, that aims at designing a pivot-based NLP technique called Universal Linguistic System, 100% using the semantic web formalisms, and being compliant with the Meaning-Text theory. Through the ULiS, a user could interact with an Interlingual Knowledge base (IKB) in controlled natural language. Linguistic resources themselves are part of a specific IKB: The Universal Lexical Knowledge base (ULK), so that actors may enhance their controlled natural language, through requests in controlled natural language. In this paper we propose a basic interaction scenario at the system level, and then we propose and overview the layered architecture of ULiS: meta-ontology, ontology, facts; and ontology, interlingual knowledge, situational knowledge

    ULiS: An Expert System on Linguistics to Support Multilingual Management of Interlingual Semantic Web Knowledge bases

    Get PDF
    International audienceWe are interested in bridging the world of natural language and the world of the semantic web in particular to support multilingual access to the web of data. In this paper we introduce the ULiS project, that aims at designing a pivot-based NLP technique called Universal Linguistic System, 100% using the semantic web formalisms, and being compliant with the Meaning-Text theory. Through the ULiS, a user could interact with an interlingual knowledge base (IKB) in controlled natural language. Linguistic resources themselves are part of a specific IKB: The Universal Lexical Knowledge base (ULK), so that actors may enhance their controlled natural language, through requests in controlled natural language. We describe a basic interaction scenario at the system level, and provide an overview of the architecture of ULiS. We then introduce the core of the ULiS: the interlingual lexical ontology ILexicOn), in which each interlingual lexical unit class (ILUc) supports the projection of its semantic decomposition on itself. We validate our model with a standalone ILexicOn, and introduce and explain a concise human-readable notation for it.Nous nous intĂ©ressons Ă  lier le monde du langage naturel et le monde du web sĂ©mantique en particulier pour permettre l'accĂšs multilingue au web de donnĂ©es. Dans cet article nous introduisons le projet ULiS, qui porte sur la conception d'une technique de TAL basĂ©e sur un pivot appelĂ© le SystĂšme Linguistique Universel, qui utilise les formalismes du web sĂ©mantique Ă  100%, et qui est conforme Ă  la thĂ©orie Sens-Texte. A l'aide d'ULiS, un utilisateur peut interagir avec une base de connaissances interlingue (IKB) en langage naturel contrĂŽlĂ©. Les ressources linguistiques sont elles-mĂȘmes une IKB: la base de connaissance lexicale universelle (UKB), de sorte que les acteurs peuvent amĂ©liorer leur langage naturel contrĂŽlĂ©, en interagissant en langage contrĂŽlĂ© avec le systĂšme. On dĂ©crit un scĂ©nario d'interaction basique au niveau du systĂšme, et on survole l'architecture d'ULiS. Ensuite on prĂ©sente le cƓur d'ULis : l'ontologie lexicale interlingue ILexicOn, oĂč chaque classe de lexie interlingue (ILUc) reprĂ©sente la projection de sa dĂ©composition sĂ©mantique sur elle-mĂȘme. On valide notre modĂšle avec un petit ILexicOn, et on introduit une notation concise comprĂ©hensible par l'humain pour l'ILexicOn

    Handling Translation Divergences in Generation-Heavy Hybrid Machine Translation

    Get PDF
    This paper describes a novel approach for handling translation divergences in a Generation-Heavy Hybrid Machine Translation (GHMT) system. The approach depends on the existence of rich target language resources such as word lexical semantics, including information about categorial variations and subcategorization frames. These resources are used to generate multiple structural variations from a target-glossed lexico-syntactic representation of the source language sentence. The multiple structural variations account for different translation divergences. The overgeneration of the approach is constrained by a target-language model using corpus-based statistics. The exploitation of target language resources (symbolic and statistical) to handle a problem usually reserved to Transfer and Interlingual MT is useful for translation from structurally divergent source languages with scarce linguistic resources. A preliminary evaluation on the application of this approach to Spanish-English MT proves this approach extremely promising. The approach however is not limited to MT as it can be extended to monolingual NLG applications such as summarization. Also UMIACS-TR-2002-23 Also LAMP-TR-08

    Aspects of the collocational analysis of meaning with special reference to some Biblical Hebrew anatomical idioms

    Get PDF
    Although the biblical data presented can be properly assessed only by a Hebraist/Old Testament exegete, I have attempted to make the work a little more accessible to linguistic scientists without specialization in Hebrew through provision of English glosses of Hebrew passages (rarely of more than a biblical verse in length). Typically these glosses are from NEB, although where NEB's rendering does not closely match the Hebrew sequence (e.g., if NEB omits certain Hebrew phrases because they would be redundant or cumbersome in English, or adopts substantial emendations of NT, or is, in my opinion, erroneous in respect of a particular translation) I have utilized JB, or, occasionally, AV. Italicized sequences (narking expressions not directly expressed in the Hebrew original) in AV (and in the translation of Rash!) are not thus distinguished in my quotations, and I have used 'Lord' for AV and NEB 'LORD'. NEB has been chosen as the primary source because at a semantic, if not a stylistic, level it provides an 'idiomatic' translation, and because its emendations are easy to trace (through Brockington's work). The few tines that I wish to make a translation point particularly strongly or where I feel none of the forementioned translations to be adequate I provide my own glosses. Such renderings, unlike those quoted from other sources, are not accompanied by a citation of source. Within glosses words representing a collocation or other expression being discussed are capitalized. BHK/S is used as the source of quotations from the Hebrew Bible, although its division of cola is not displayed; the caesura (athnach) is sometimes indicated by the use of a new line, or, if only one line of text is displayed, by a double space within this line. In 'citation-forms' of Hebrew text, we utilize a 'plene' orthography. Chapter and verse references are always to the Hebrew Bible. ..

    Italian VerbNet: A Construction-based Approach to Italian Verb Classification

    Get PDF
    L'elaborato consiste nella proposta di una nuova classificazione verbale per l'italiano, sulla base dell'autorevole modello inglese di VerbNet. Il metodo elaborato, punto centrale della ricerca, Ăš stato sviluppato in modo da consentire la creazione di classi compatibili con il modello inglese, ma allo stesso tempo autonome e basate su criteri teorici indipendenti. Ad una parte esplicativa segue l'esposizione dei dati correlati da commenti

    Sprachwandel (Seminar)

    Get PDF
    In dem Seminar wird Sprachwandel aus verschiedenen Perspektiven beleuchtet

    The Translation of God's Names in the Quran: A Descriptive Study

    Get PDF
    This thesis explores the translation of God’s names in the Quran. It centres around many of the common issues that the translators of divine attributes face. Since these are sensitive cultural items, translators should ideally give special treatment to divine designations. God’s names are not just stock names but rather they are nominalized adjectives with a descriptive content. As such divine names can enter into a variety of semantic relations such as synonymy, polysemy, hyponymy and hyperonymy (also termed ‘hypernymy’ and ‘superordinateness’). Divine names’ highly-nuanced semantic, syntactic and morphological makeup means that they require delicate treatment on the part of translators. Quran translators realize that God’s names are culture-bound terms and employ different techniques to give faithful renditions. Often they make use of an amalgamation of strategies to accurately reflect their meaning(s) and offset any loss thereof. By and large, literal translation seems to take a rather safe precedence over any other strategy, which gives a safeguard against any misrepresentation of divine attributes. Sometimes the presence of recognized or cultural equivalents is a sufficient warrant to depart from literal matches. This thesis shows how selected Quran translators exhibit varying degrees of consistency in their renditions of divine names, which may be attributable to the absence of hard-and-fast rules for the interlingual transfer of culturally laden lexemes. A convoluted issue that Quran translators face is how to tackle near-synonymous expressions. The situation is aggravated when they deal with divine names where near-synonymy exists in abundance. Quite often, the selected translators in this study have not been able to successfully replicate the more pronounced differences between near-synonymous divine names. Finding matchable polysemous items between languages is a familiar quandary that interpreters have to grapple with. Data in this study demonstrates how it is a taxing task trying to find a single item in English that bears the īe range of senses that a polysemous divine name has. Quran translators are often confronted with the task of picking up a single sense out of the multiple senses that the divine name can designate; the onus in such a pursuit is typically on the Quran exegeses. Usually, the primary (or literal) sense is the translators’ first port of call to the exclusion of any other secondary sense. It is uncommon to find a translator who is keen on conveying the semantic polyvalence of God’s appelations. In this way, Quran translators, inadvertently, do not do justice to the richness of the Quran text despite many readers’ eagerness to become illuminated about the various meanings of their Sacred Book. It is perhaps translators’ proclivity for brevity that is the overriding factor that has stopped them in their tracks. It is reasonable to assume that the brushing aside of (intended) secondary meanings of divine names by many Quran translators to chase ‘structural fidelity’ has come at the expense of more accurate glosses

    WORD SENSE DISAMBIGUATION WITHIN A MULTILINGUAL FRAMEWORK

    Get PDF
    Word Sense Disambiguation (WSD) is the process of resolving the meaning of a word unambiguously in a given natural language context. Within the scope of this thesis, it is the process of marking text with explicit sense labels. What constitutes a sense is a subject of great debate. An appealing perspective, aims to define senses in terms of their multilingual correspondences, an idea explored by several researchers, Dyvik (1998), Ide (1999), Resnik & Yarowsky (1999), and Chugur, Gonzalo & Verdejo (2002) but to date it has not been given any practical demonstration. This thesis is an empirical validation of these ideas of characterizing word meaning using cross-linguistic correspondences. The idea is that word meaning or word sense is quantifiable as much as it is uniquely translated in some language or set of languages. Consequently, we address the problem of WSD from a multilingual perspective; we expand the notion of context to encompass multilingual evidence. We devise a new approach to resolve word sense ambiguity in natural language, using a source of information that was never exploited on a large scale for WSD before. The core of the work presented builds on exploiting word correspondences across languages for sense distinction. In essence, it is a practical and functional implementation of a basic idea common to research interest in defining word meanings in cross-linguistic terms. We devise an algorithm, SALAAM for Sense Assignment Leveraging Alignment And Multilinguality, that empirically investigates the feasibility and the validity of utilizing translations for WSD. SALAAM is an unsupervised approach for word sense tagging of large amounts of text given a parallel corpus — texts in translation — and a sense inventory for one of the languages in the corpus. Using SALAAM, we obtain large amounts of sense annotated data in both languages of the parallel corpus, simultaneously. The quality of the tagging is rigorously evaluated for both languages of the corpora. The automatic unsupervised tagged data produced by SALAAM is further utilized to bootstrap a supervised learning WSD system, in essence, combining supervised and unsupervised approaches in an intelligent way to alleviate the resources acquisition bottleneck for supervised methods. Essentially, SALAAM is extended as an unsupervised approach for WSD within a learning framework; in many of the cases of the words disambiguated, SALAAM coupled with the machine learning system rivals the performance of a canonical supervised WSD system that relies on human tagged data for training. Realizing the fundamental role of similarity for SALAAM, we investigate different dimensions of semantic similarity as it applies to verbs since they are relatively more complex than nouns, which are the focus of the previous evaluations. We design a human judgment experiment to obtain human ratings on verbs’ semantic similarity. The obtained human ratings are cast as a reference point for comparing different automated similarity measures that crucially rely on various sources of information. Finally, a cognitively salient model integrating human judgments in SALAAM is proposed as a means of improving its performance on sense disambiguation for verbs in particular and other word types in general

    Combining Linguistic and Machine Learning Techniques for Word Alignment Improvement

    Get PDF
    Alignment of words, i.e., detection of corresponding units between two sentences that are translations of each other, has been shown to be crucial for the success of many NLP applications such as statistical machine translation (MT), construction of bilingual lexicons, word-sense disambiguation, and projection of resources between languages. With the availability of large parallel texts, statistical word alignment systems have proven to be quite successful on many language pairs. However, these systems are still faced with several challenges due to the complexity of the word alignment problem, lack of enough training data, difficulty learning statistics correctly, translation divergences, and lack of a means for incremental incorporation of linguistic knowledge. This thesis presents two new frameworks to improve existing word alignments using supervised learning techniques. In the first framework, two rule-based approaches are introduced. The first approach, Divergence Unraveling for Statistical MT (DUSTer), specifically targets translation divergences and corrects the alignment links related to them using a set of manually-crafted, linguistically-motivated rules. In the second approach, Alignment Link Projection (ALP), the rules are generated automatically by adapting transformation-based error-driven learning to the word alignment problem. By conditioning the rules on initial alignment and linguistic properties of the words, ALP manages to categorize the errors of the initial system and correct them. The second framework, Multi-Align, is an alignment combination framework based on classifier ensembles. The thesis presents a neural-network based implementation of Multi-Align, called NeurAlign. By treating individual alignments as classifiers, NeurAlign builds an additional model to learn how to combine the input alignments effectively. The evaluations show that the proposed techniques yield significant improvements (up to 40% relative error reduction) over existing word alignment systems on four different language pairs, even with limited manually annotated data. Moreover, all three systems allow an easy integration of linguistic knowledge into statistical models without the need for large modifications to existing systems. Finally, the improvements are analyzed using various measures, including the impact of improved word alignments in an external application---phrase-based MT
    corecore