3,229 research outputs found

    Automatic Sense Disambiguation for Target Word Selection

    Get PDF

    Noun Sense Disambiguation using Co-Occurrence Relation in Machine Translation

    Get PDF
    Word Sense Disambiguation, the process of identifying the meaning of a word in a sentence when the word has multiple meanings, is a critical problem of machine translation. It is generally very difficult to select the correct meaning of a word in a sentence, especially when the syntactical difference between the source and target language is big, e.g., English-Korean machine translation. To achieve a high level of accuracy of noun sense selection in machine translation, we introduced a statistical method based on co-occurrence relation of words in sentences and applied it to the English-Korean machine translator RyongNamSan. ACM Computing Classification System (1998): I.2.7

    Typological parameters of genericity

    Get PDF
    Different languages employ different morphosyntactic devices for expressing genericity. And, of course, they also make use of different morphosyntactic and semantic or pragmatic cues which may contribute to the interpretation of a sentence as generic rather than episodic. [...] We will advance the strong hypo thesis that it is a fundamental property of lexical elements in natural language that they are neutral with respect to different modes of reference or non-reference. That is, we reject the idea that a certain use of a lexical element, e.g. a use which allows reference to particular spatio-temporally bounded objects in the world, should be linguistically prior to all other possible uses, e.g. to generic and non-specific uses. From this it follows that we do not consider generic uses as derived from non-generic uses as it is occasionally assumed in the literature. Rather, we regard these two possibilities of use as equivalent alternative uses of lexical elements. The typological differences to be noted therefore concern the formal and semantic relationship of generic and non-generic uses to each other; they do not pertain to the question of whether lexical elements are predetermined for one of these two uses. Even supposing we found a language where generic uses are always zero-marked and identical to lexical sterns, we would still not assume that lexical elements in this language primarily have a generic use from which the non-generic uses are derived. (Incidentally, none of the languages examined, not even Vietnamese, meets this criterion.

    Investigating Frequency and Type of Lexical Collocations in Applied Linguistics Journal Articles Written in English by Iranian and Norwegian Scholars

    Get PDF
    Master's thesis in Literacy StudiesIn today’s academic world, the research interest in corpus linguistics has shifted towards word co-occurrence rather than single words. Accordingly, a great body of literature has been devoted to investigations of recurrent word combinations in academic prose using frequency and dispersion parameters. This has resulted in analysis of corpus in different fields of study to collect comprehensive lists of academic collocations. Moreover, many contrastive studies have been conducted to compare the collocations used by native and non-native speakers of English. However, to the author’s knowledge, few studies have been conducted to compare the most frequent collocations in two corpora of research articles written by non-native speakers of English published in international journals in the field of applied linguistics. To fill this gap in the literature, the current study investigated the most frequent collocations used by Iranian and Norwegian scholars in a corpus of 17 articles published in the Journal of Pragmatics through a frequency-based approach. Nine out of 17 articles were written by Iranian scholars including 67,673 words and eight out of 17 articles were written by Norwegian scholars comprising of 64,682 words. The data of this study were collected using Collocation Extract software. The results of the study were presented in three phases. In the first phase, 15 most frequent lexical collocations in both corpora were identified which were classified under three types of lexical collocations. Based on what was obtained, Adj+N collocation type had the most proportion in the corpora while Adv+Adj type had the least proportion. In the second phase, the lexical collocations of the Iranian corpus were presented including a total of 818 collocations classified under five types. According to the results, Adj+N was the most frequent type while N+V was the least frequent one. Similar to the Iranian corpus, lexical collocations of the Norwegian corpus were identified. They were classified under four types including a total of 462, among which Adj+N was the most frequent type while Adv+Adj was the least frequent one. In the third phase, frequencies of lexical collocations were compared in the two corpora. According to the obtained results, the two corpora did not have any had significant difference in the use of all types of collocation except for Adj+N type of lexical collocations

    Korean Grammar Using TAGs

    Get PDF
    This paper addresses various issues related to representing the Korean language using Tree Adjoining Grammars. Topics covered include Korean grammar using TAGs, Machine Translation between Korean and English using Synchronous Tree Adjoining Grammars (STAGs), handling scrambling using Multi Component TAGs (MC-TAGs), and recovering empty arguments. The data for the parsing is from US military communication messages
    • 

    corecore