12 research outputs found

    A tree-based approach for English-to-Turkish translation

    Get PDF
    In this paper, we present our English-to-Turkish translation methodology, which adopts a tree-based approach. Our approach relies on tree analysis and the application of structural modification rules to get the target side (Turkish) trees from source side (English) ones. We also use morphological analysis to get candidate root words and apply tree-based rules to obtain the agglutinated target words. Compared to earlier work on English-to-Turkish translation using phrase-based models, we have been able to obtain higher BLEU scores in our current study. Our syntactic subtree permutation strategy, combined with a word replacement algorithm, provides a 67% relative improvement from a baseline 12.8 to 21.4 BLEU, all averaged over 10-fold cross-validation. As future work, improvements in choosing the correct senses and structural rules are needed.This work was supported by TUBITAK project 116E104Publisher's Versio

    Investigating syntactic effects in NPI illusions in Turkish

    No full text

    Comparing sense categorization between English propbank and english wordnet

    No full text
    Given the fact that verbs play a crucial role in language comprehension, this paper presents a study which compares the verb senses in English PropBank with the ones in English WordNet through manual tagging. After analyzing 1554 senses in 1453 distinct verbs, we have found out that while the majority of the senses in PropBank have their one-to-one correspondents in WordNet, a substantial amount of them are differentiated. Furthermore, by analysing the differences between our manually-tagged and an automatically-tagged resource, we claim that manual tagging can help provide better results in sense annotation.Publisher's Versio

    Comparison of Turkish proposition banks by frame matching

    No full text
    By indicating semantic relations between a predicate and its associated participants in a sentence and identifying the role-bearing constituents, SRL provides an extensive dataset to understand natural languages and to enhance several NLP applications such as information retrieval, machine translation, information extraction, and question answering. The availability of large resources and the development of statistical machine learning methods have increased the studies in the field of SRL. One of the widely-used semantic resources applied for multiple languages is PropBank. In this paper, PropBanks applied for Turkish are compared by checking semantic roles in the frame files of matched verb senses. As this integrated lexical resource for Turkish is aimed to be used in a multilingual resource along with English, creation of an inclusive lexical resource for Turkish is of great importance.Publisher's Versio

    Problems caused by semantic drift in WordNet synset construction

    No full text
    In this study, we summarize the semantic drift problem that occur in specific synsets of KeNet, a Turkish WordNet, which is caused by mis-merging of semantically-related lexical items, morphological markings and false part of speech (POS) matchings. We present our approach to these problems in order to eliminate the semantic drift. We have re-analyzed the dictionary definitions of the items, placed those that possess different verbal markings into separate synsets, and divided synsets based on the POS of the items in them.Publisher's Versio

    Integrating Turkish Wordnet KeNet to Princeton WordNet: The case of one-to-many correspondences

    No full text
    In this paper, we introduce a novel approach of forming interlingual relations between multilingual wordnets. We have mapped Turkish senses in KeNet with their corresponding senses in Princeton WordNet by drawing one-To-many correspondences. As a result of language-specific properties, one synset in one language is matched with multiple synsets in the other language in some cases. Our method of integrating KeNet into a multilingual network also included mapping the most frequent 5000 senses in English with their equivalent senses in Turkish. What we demonstrate is that one-To-many interlingual correspondances are necessary to include in mappings both from Turkish-To-English and English-To-Turkish. Furthermore, one-To-many mappings give us insights into the semantic relations to be constructed in Turkish, such as hypernymy.Publisher's Versio

    (Long-distance) licensing of NPIs in Turkish as L1 and L2

    No full text
    This project examines how negative polarity items are processed in Turkish as L1 and L2. The specific questions address if memory-based accounts or expectation-based parsing models better explain long-distance licensing of Turkish NPIs. The study also examines the role of syntactic and semantic information in this process

    English-Turkish parallel semantic annotation of Penn-Treebank

    No full text
    This paper reports our efforts in constructing a sense-labeled English-Turkish parallel corpus using the traditional method of manual tagging. We tagged a pre-built parallel treebank which was translated from the Penn Treebank corpus. This approach allowed us to generate a resource combining syntactic and semantic information. We provide statistics about the corpus itself as well as information regarding its development process.Publisher's Versio
    corecore