13 research outputs found

    câ—‹2012 The Association for Computational Linguistics

    No full text
    The front-page picture is licensed by xkcd.com under the terms of the Creative Common

    câ—‹2010 The Association for Computational Linguistics

    No full text
    • Endorsed by SIGPARSE, the ACL Special Interest Group on Natura

    Using POS Information for Statistical Machine Translation into

    No full text
    When translating from languages with hardly any inflectional morphology like English into morphologically rich languages, the English word forms often do not contain enough information for producing the correct fullform in the target language. We investigate methods for improving the quality of such translations by making use of part-ofspeech information and maximum entropy modeling. Results for translations from English into Spanish and Catalan are presented on the LC-STAR corpus which consists of spontaneously spoken dialogues in the domain of appointment scheduling and travel planning.

    Estimating Lexical Priors for

    No full text
    this paper are of potential importance in various applications that require lexical disambiguation and where an estimate of lexical priors is required. For high-frequency words, one can obtain fairly reliable estimates of the lexical priors by tagging a corpus that gives a good coverage to words of various ranges. For predicting the lexical priors for the much larger mass of very low-frequency types, most of which would not occur in any such corpus, the results we have presented suggest that one should concentrate on tagging a good representative sample of the hapaxes, rather than extensively tagging words of all frequency ranges. Acknowledgments The authors wish to thank four anonymous reviewers for Computational Linguistics for useful comments on this pape

    Verbs are where all the action lies: Experiences of Shallow Parsing of a

    No full text
    Verb suffixes and verb complexes of morphologically rich languages carry a lot of information. We show that this information if harnessed for the task of shallow parsing can lead to dramatic improvements in accuracy for a morphologically rich language- Marathi1. The crux of the approach is to use a powerful morphological analyzer backed by a high coverage lexicon to generate rich features for a CRF based sequence classifier. Accuracy figures of 94 % for Part of Speech Tagging and 97 % for Chunking using a modestly sized corpus (20K words) vindicate our claim that for morphologically rich languages linguistic insight can obviate the need for large amount of annotated corpora.

    Verbs are where all the Action Lies: Experiences of Shallow Parsing of a Morphologically Rich Language

    No full text
    Verb suffixes and verb complexes of morphologically rich languages carry a lot of information. We show that this information if harnessed for the task of shallow parsing can lead to dramatic improvements in accuracy for a morphologically rich language- Marathi1. The crux of the approach is to use a powerful morphological analyzer backed by a high coverage lexicon to generate rich features for a CRF based sequence classifier. Accuracy figures of 94 % for Part of Speech Tagging and 97 % for Chunking using a modestly sized corpus (20K words) vindicate our claim that for morphologically rich languages linguistic insight can obviate the need for large amount of annotated corpora.

    Automatic Morphological Enrichment

    No full text
    In this paper, we study the problem of automatic enrichment of a morphologically underspecified treebank for Arabic, a morphologically rich language. We show that we can map from a tagset of size six to one with 485 tags at an accuracy rate of 94%-95%. We can also identify the unspecified lemmas in the treebank with an accuracy over 97%. Furthermore, we demonstrate that using our automatic annotations improves the performance of a state-of-the-art Arabic morphological tagger. Our approach combines a variety of techniques from corpus-based statistical models to linguistic rules that target specific phenomena. These results suggest that the cost of treebanking can be reduced by designing underspecified treebanks that can be subsequently enriched automatically.

    Diagnostic Pathology BioMed Central Review

    No full text
    biologically distinctive putative precursor lesions of Type II endometrial cancer

    WordNet 2 - A Morphologically and Semantically Enhanced

    No full text
    This paper presents an on-going project mtended to enhance WordNet mot phologlcally and semantically The motivation for this work steams from the current hmlta- tions of WordNet when used as a hngmsttc knowledge base We envision a software tool that automatically parses the conceptual defining glosses, attmbutmg part-ofspeech tags and phrasal brackets The nouns, verbs, adjectives and adverbs from every definition are then dlsambiguated and linked to the corresponding synsets This increases the connectivity between synsets allowing the letmeval of topically related concepts Furthermore, the tool tlansforms the glosses, first into logical forms and then into semantic forms mg demvational morphology new links are added between the synsets 1 Motivation VordNet has already been tecogmzed as a valuable l esource m the human language technolog and knoledge processing communmes Its applicability has been cited in mole than 200 papers and s}stems have been mplemented using WordNet A WordNet bibhogaph is maintained at the Umveisit o Pennshama ( htp //www cis upenn edu/~osepht/wnb bho him 0 In Europe, WordNet is being used to develop a multilingual database with basic semantic relations between words for several European languages (the EuroVordNet project) Capabilities ]VordNet was conceived as a machine-readable dictionary, following psychohn- ginstic principles Unlike standard alphabetical dlct onai les hch oi gamze vo cabulat es using mot pho- logtcal slmlla ttles, WordNet structures lextcal lnfor- matron m terms of word meanings WordNet maps word forms in word senses using the sntactlc category as a parametel Although it covers onl. pats of speech nouns verbs, adjectives and erbs, it encompasses a large majont} of Enghsh vords ( http //www cogsc, princet..
    corecore