238 research outputs found
Morphological Disambiguation by Voting Constraints
We present a constraint-based morphological disambiguation system in which
individual constraints vote on matching morphological parses, and
disambiguation of all the tokens in a sentence is performed at the end by
selecting parses that receive the highest votes. This constraint application
paradigm makes the outcome of the disambiguation independent of the rule
sequence, and hence relieves the rule developer from worrying about potentially
conflicting rule sequencing. Our results for disambiguating Turkish indicate
that using about 500 constraint rules and some additional simple statistics, we
can attain a recall of 95-96% and a precision of 94-95% with about 1.01 parses
per token. Our system is implemented in Prolog and we are currently
investigating an efficient implementation based on finite state transducers.Comment: 8 pages, Latex source. To appear in Proceedings of ACL/EACL'97
Compressed postscript also available as
ftp://ftp.cs.bilkent.edu.tr/pub/ko/acl97.ps.
Using Multiple Sources of Information for Constraint-Based Morphological Disambiguation
This thesis presents a constraint-based morphological disambiguation approach
that is applicable to languages with complex morphology--specifically
agglutinative languages with productive inflectional and derivational
morphological phenomena. For morphologically complex languages like Turkish,
automatic morphological disambiguation involves selecting for each token
morphological parse(s), with the right set of inflectional and derivational
markers. Our system combines corpus independent hand-crafted constraint rules,
constraint rules that are learned via unsupervised learning from a training
corpus, and additional statistical information obtained from the corpus to be
morphologically disambiguated. The hand-crafted rules are linguistically
motivated and tuned to improve precision without sacrificing recall. In certain
respects, our approach has been motivated by Brill's recent work, but with the
observation that his transformational approach is not directly applicable to
languages like Turkish. Our approach also uses a novel approach to unknown word
processing by employing a secondary morphological processor which recovers any
relevant inflectional and derivational information from a lexical item whose
root is unknown. With this approach, well below 1% of the tokens remains as
unknown in the texts we have experimented with. Our results indicate that by
combining these hand-crafted, statistical and learned information sources, we
can attain a recall of 96 to 97% with a corresponding precision of 93 to 94%,
and ambiguity of 1.02 to 1.03 parses per token.Comment: M.Sc. Thesis submitted to the Department of Computer Engineering and
Information Science, Bilkent University, Ankara, Turkey. Also available as:
ftp://ftp.cs.bilkent.edu.tr/pub/tech-reports/1996/BU-CEIS-9615ps.
Sabra, a Syriac TeX System
International audienceIn this paper we present "Sabra", a typesetting system for the Syriac script, based on TeX and METAFONT. We cover Serto, Estrangello, as well as East Syriac, both for the Syriac language as for Garshuni. Ligatures as well as stretching connections (keshideh) are automatically performed. Straight or curved isolated Serto olaph are chosen according to a set of rules described in the paper. This work is part of the ScholarTeX project
A decision support system for curricula design
A curriculum is a set of related courses that constitutes the basis of a degree program. The required courses of a curriculum generally build student knowledge and skills particular to the field. In most cases, these are cumulative, meaning that as students go through their studies, they put their new knowledge on top of earlier ones, hence leading to the notion of prerequisite courses that must precede a given course. As accreditation practices gain widespread acceptance, and as uniformity among peer institutions is promoted to facilitate mobility, each course is assigned a set of learning outcomes. The learning outcomes of a prerequisite course are seen to encapsulate the skills necessary to take the downstream course. This study follows our efforts regarding the substantial revision of engineering courses throughout our college. As the task is quite involved, we developed a flexible linear programming-based tool to help the decision-making process by quickly evaluating alternative curricula. This study aims to provide an effective decision-making tool to accommodate many “what if” scenarios which would provide options to the decision-makers and help them detecting inconsistencies and oversights. This paper describes our approach and our experiences
The Traditional Arabic Typecase, Unicode, TeX and METAFONT
International audienc
- …