14 research outputs found

    Parsing Turkish using the lexical functional grammar formalism

    Get PDF
    This paper describes our work on parsing Turkish using the lexical-functional grammar formalism [11]. This work represents the first effort for wide-coverage syntactic parsing of Turkish. Our implementation is based on Tomita's parser developed at Carnegie Mellon University Center for Machine Translation. The grammar covers a substantial subset of Turkish including structurally simple and complex sentences, and deals with a reasonable amount of word order freeness. The complex agglutinative morphology of Turkish lexical structures is handled using a separate two-level morphological analyzer, which has been incorporated into the syntactic parser. After a discussion of the key relevant issues regarding Turkish grammar, we discuss aspects of our system and present results from our implementation. Our initial results suggest that our system can parse about 82% of the sentences directly and almost all the remaining with very minor pre-editing. © 1995 Kluwer Academic Publishers

    Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology

    Get PDF
    Every culture and language is unique. Our work expressly focuses on the uniqueness of culture and language in relation to human affect, specifically sentiment and emotion semantics, and how they manifest in social multimedia. We develop sets of sentiment- and emotion-polarized visual concepts by adapting semantic structures called adjective-noun pairs, originally introduced by Borth et al. (2013), but in a multilingual context. We propose a new language-dependent method for automatic discovery of these adjective-noun constructs. We show how this pipeline can be applied on a social multimedia platform for the creation of a large-scale multilingual visual sentiment concept ontology (MVSO). Unlike the flat structure in Borth et al. (2013), our unified ontology is organized hierarchically by multilingual clusters of visually detectable nouns and subclusters of emotionally biased versions of these nouns. In addition, we present an image-based prediction task to show how generalizable language-specific models are in a multilingual context. A new, publicly available dataset of >15.6K sentiment-biased visual concepts across 12 languages with language-specific detector banks, >7.36M images and their metadata is also released.Comment: 11 pages, to appear at ACM MM'1

    Using Multiple Sources of Information for Constraint-Based Morphological Disambiguation

    Get PDF
    This thesis presents a constraint-based morphological disambiguation approach that is applicable to languages with complex morphology--specifically agglutinative languages with productive inflectional and derivational morphological phenomena. For morphologically complex languages like Turkish, automatic morphological disambiguation involves selecting for each token morphological parse(s), with the right set of inflectional and derivational markers. Our system combines corpus independent hand-crafted constraint rules, constraint rules that are learned via unsupervised learning from a training corpus, and additional statistical information obtained from the corpus to be morphologically disambiguated. The hand-crafted rules are linguistically motivated and tuned to improve precision without sacrificing recall. In certain respects, our approach has been motivated by Brill's recent work, but with the observation that his transformational approach is not directly applicable to languages like Turkish. Our approach also uses a novel approach to unknown word processing by employing a secondary morphological processor which recovers any relevant inflectional and derivational information from a lexical item whose root is unknown. With this approach, well below 1% of the tokens remains as unknown in the texts we have experimented with. Our results indicate that by combining these hand-crafted, statistical and learned information sources, we can attain a recall of 96 to 97% with a corresponding precision of 93 to 94%, and ambiguity of 1.02 to 1.03 parses per token.Comment: M.Sc. Thesis submitted to the Department of Computer Engineering and Information Science, Bilkent University, Ankara, Turkey. Also available as: ftp://ftp.cs.bilkent.edu.tr/pub/tech-reports/1996/BU-CEIS-9615ps.

    Design and implementation of a verb lexicon and verb sense disambiguator for Turkish

    Get PDF
    Ankara : Department of Computer Engineering and Information Science and Institute of Engineering and Science, Bilkent University, 1994.Thesis (Master's) -- -Bilkent University, 1994.Includes bibliographical refences.The lexicon has a crucial role in all natural language processing systems and has special importance in machine translation systems. This thesis presents the design and implementation of a verb lexicon and a verb sense disambigua- tor for Turkish. The lexicon contains only verbs because verbs encode events in sentences and play the most important role in natural language processing systems, especially in parsing (syntactic analyzing) and machine translation. The verb sense disambiguator uses the information stored in the verb lexicon that we developed. The main purpose of this tool is to disambiguate senses of verbs having several meanings, some of which are idiomatic. We also present a tool implemented in Lucid Common Lisp under X-Windows for adding, accessing, modifying, and removing entries of the lexicon, and a semantic concept ontology containing semantic features of commonly used Turkish nouns.Yılmaz, OkanM.S

    Tagging and morphological disambiguation of Turkish text

    Get PDF
    Ankara : Department of Computer Engineering and Information Science and Institute of Engineering and Science, Bilkent University, 1994.Thesis (Master's) -- -Bilkent University, 1994.Includes bibliographical refences.A part-of-speech (POS) tagger is a system that uses various sources of information to assign possibly unique POS to words. Automatic text tagging is an important component in higher level analysis of text corpora. Its output can also be used in many natural language processing applications. In languages like Turkish or Finnish, with agglutinative morphology, morphological disambiguation is a very crucial process in tagging as the structures of many lexical forms are morphologically ambiguous. This thesis present a POS tagger for Turkish text based on a full-scale two-level specification of Turkish morphology. The tagger is augmented with a multi-word and idiomatic construct recognizer, and most importantly morphological disambiguator based on local lexical neighborhood constraints, heuristics and limited amount of statistical information. The tagger also has additional functionality for statistics compilation and fine tuning of the morphological analyzer, such as logging erroneous morphological parses, commonly used roots, etc. Test results indicate that the tagger can tag about 97/% to 99% of the texts accurately with very minimal user intervention. Furthermore for sentences morphologically disambiguated with the tagger, an LFG parser developed for Turkish, on the average, generates 50% less ambiguous parses almost 2.5 times faster.Kuruöz, İlkerM.S

    Statistical modeling of agglutinative languages

    Get PDF
    Ankara : Department of Computer Engineering and the Institute of Engineering and Science of Bilkent Univ., 2000.Thesis (Ph.D.) -- Bilkent University, 2000.Includes bibliographical references leaves 107-116Hakkani-Tür, Dilek ZPh.D

    Lfg For Turkish Point-in-time Expressions

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2007Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2007Paralel Gramerler (ParGram) projesi dahilinde geliştirilmekte olan Türkçe için geniş kapsamlı sözcüksel işlevsel gramere katkı sağlaması amacıyla yapılan bu çalışmada “ne zaman?” sorusunu cevaplayan ifadelerin çözümlenmesi anlatılmaktadır. Gerçeklenen gramerde saat ifadeleri, haftanın günleri, tarih ve mevsim ifadeleri ile diğer bazı genel ifadeler ele alınmıştır. Türkçe cümlelerin sözdizimsel olarak doğru çözümlenmesi için yapılan eylemi tamamlayan tümlecin türünün (zaman, yer vs.) belirlenmesini gerektiren haller vardır. Bunun yanısıra soru-cevaplama gibi bazı uygulamalarda, verilen metinden durum ya da eylemin gerçekleştiği zaman bilgisinin çıkarılması gerekebilir. Bunlar, ancak ilgili bilginin zaman ifadesi olduğunun önceden işaretlenmesi ile mümkündür. Bu çalışma, Türkçede zamanda yer bildiren ifadelerin sentaktik çözümünü Sözcüksel İşlevsel Gramer üzerinden anlamsal ayırt edicilerle işaretleyerek üretmektedir. Türkçe derlem üzerinde yapılan detaylı bir çalışma ile yukarıda sınıflanan ifadeler toplanmış ve bu ifadelerin işaretlendiği ayrı bir gramer geliştirilmiştir.This work presents the analysis and the implementation of a date-time grammar for Turkish Lexical Functional Grammar (LFG) which may be considered as a contribution to the development of the large scale Turkish LFG grammar in the context of the Parallel Grammar Project (ParGram). The scope of the date-time grammar is restricted to the answer of “when” questions, i.e. points in time expressions, in particular to the clock-time, days of the week, calendar dates and seasons. Some general phrases are also adressed. When analysing Turkish sentences, there are cases where the type of the adjunct (e.g. temporal, locational etc.) has to be distinguished for syntactic reasons. Besides, temporal information of an event or a state may specifically need to be extracted from a given text (e.g. question answering). These require such information to be marked as “temporal”. The goal of this study is to produce the syntactic analysis of Turkish point-in-time expressions with rather more details of semantic distinction. Along with a corpus examination, a large number of sample phrases have been collected and a separate set of phrase structure rules have been written where point-in-time expressions are marked.Yüksek LisansM.Sc
    corecore