7 research outputs found

    Two Approaches for Building an Unsupervised Dependency Parser and their Other Applications

    Get PDF
    Much work has been done on building a parser for natural languages, but most of this work has concentrated on supervised parsing. Unsupervised parsing is a less explored area, and unsupervised dependency parser has hardly been tried. In this paper we present two approaches for building an unsupervised dependency parser. One approach is based on learning dependency relations and the other on learning subtrees. We also propose some other applications of these approaches

    Identification of Languages and Encodings in a

    No full text
    Text on the Web is available in numerous languages and encodings, often not according to any standards. The number of multilingual documents on the Web is also increasing. The problem of identifying the languages and encodings in a multilingual document and marking portions of a document with them has not been addressed so far. We present an exploration of this problem, the implied or required assumptions, and a solution. The problem can be divided into three parts: monolingual identification, enumeration of languages and identification of the language of every portion. For enumeration, we have been able to get a precision of 96.20%. We also experimented on language identification of each word. Given correct enumeration, we could obtain type precision of 90.91 % and token precision of 86.80%. Finally, we show how precision is affected by language distance

    Disambiguating Tense, Aspect and Modality Markers for Correcting Machine Translation Errors

    No full text
    All languages mark tense, aspect and modality (TAM) in some way, but the markers don’t have a one-to-one mapping across languages. Many errors in machine translation (MT) are due to wrong translation of TAM markers. Reducing them can improve the performance of an MT system. We used about 9000 sentence pairs from an English-Hindi parallel corpus. These were manually annotated with TAM markers and their mappings. Based on this corpus, we identify the factors responsible for ambiguity in translation. We present the results for learning TAM marker translation using CRF. We achieved an improvement of 17.88 % over the baseline
    corecore