238 research outputs found

    Morphological Disambiguation by Voting Constraints

    Full text link
    We present a constraint-based morphological disambiguation system in which individual constraints vote on matching morphological parses, and disambiguation of all the tokens in a sentence is performed at the end by selecting parses that receive the highest votes. This constraint application paradigm makes the outcome of the disambiguation independent of the rule sequence, and hence relieves the rule developer from worrying about potentially conflicting rule sequencing. Our results for disambiguating Turkish indicate that using about 500 constraint rules and some additional simple statistics, we can attain a recall of 95-96% and a precision of 94-95% with about 1.01 parses per token. Our system is implemented in Prolog and we are currently investigating an efficient implementation based on finite state transducers.Comment: 8 pages, Latex source. To appear in Proceedings of ACL/EACL'97 Compressed postscript also available as ftp://ftp.cs.bilkent.edu.tr/pub/ko/acl97.ps.

    Using Multiple Sources of Information for Constraint-Based Morphological Disambiguation

    Get PDF
    This thesis presents a constraint-based morphological disambiguation approach that is applicable to languages with complex morphology--specifically agglutinative languages with productive inflectional and derivational morphological phenomena. For morphologically complex languages like Turkish, automatic morphological disambiguation involves selecting for each token morphological parse(s), with the right set of inflectional and derivational markers. Our system combines corpus independent hand-crafted constraint rules, constraint rules that are learned via unsupervised learning from a training corpus, and additional statistical information obtained from the corpus to be morphologically disambiguated. The hand-crafted rules are linguistically motivated and tuned to improve precision without sacrificing recall. In certain respects, our approach has been motivated by Brill's recent work, but with the observation that his transformational approach is not directly applicable to languages like Turkish. Our approach also uses a novel approach to unknown word processing by employing a secondary morphological processor which recovers any relevant inflectional and derivational information from a lexical item whose root is unknown. With this approach, well below 1% of the tokens remains as unknown in the texts we have experimented with. Our results indicate that by combining these hand-crafted, statistical and learned information sources, we can attain a recall of 96 to 97% with a corresponding precision of 93 to 94%, and ambiguity of 1.02 to 1.03 parses per token.Comment: M.Sc. Thesis submitted to the Department of Computer Engineering and Information Science, Bilkent University, Ankara, Turkey. Also available as: ftp://ftp.cs.bilkent.edu.tr/pub/tech-reports/1996/BU-CEIS-9615ps.

    Sabra, a Syriac TeX System

    Get PDF
    International audienceIn this paper we present "Sabra", a typesetting system for the Syriac script, based on TeX and METAFONT. We cover Serto, Estrangello, as well as East Syriac, both for the Syriac language as for Garshuni. Ligatures as well as stretching connections (keshideh) are automatically performed. Straight or curved isolated Serto olaph are chosen according to a set of rules described in the paper. This work is part of the ScholarTeX project

    A decision support system for curricula design

    Get PDF
    A curriculum is a set of related courses that constitutes the basis of a degree program. The required courses of a curriculum generally build student knowledge and skills particular to the field. In most cases, these are cumulative, meaning that as students go through their studies, they put their new knowledge on top of earlier ones, hence leading to the notion of prerequisite courses that must precede a given course. As accreditation practices gain widespread acceptance, and as uniformity among peer institutions is promoted to facilitate mobility, each course is assigned a set of learning outcomes. The learning outcomes of a prerequisite course are seen to encapsulate the skills necessary to take the downstream course. This study follows our efforts regarding the substantial revision of engineering courses throughout our college. As the task is quite involved, we developed a flexible linear programming-based tool to help the decision-making process by quickly evaluating alternative curricula. This study aims to provide an effective decision-making tool to accommodate many “what if” scenarios which would provide options to the decision-makers and help them detecting inconsistencies and oversights. This paper describes our approach and our experiences

    CHULA TTS: A Modularized Text-To-Speech Framework

    Get PDF

    The Traditional Arabic Typecase, Unicode, TeX and METAFONT

    Get PDF
    International audienc
    corecore