Search CORE

2 research outputs found

Morphological Analysis as Classification: an Inductive-Learning Approach

Author: Bosch Antal van den
Daelemans Walter
Weijters Ton
Publication venue
Publication date: 01/01/1996
Field of study

Morphological analysis is an important subtask in text-to-speech conversion, hyphenation, and other language engineering tasks. The traditional approach to performing morphological analysis is to combine a morpheme lexicon, sets of (linguistic) rules, and heuristics to find a most probable analysis. In contrast we present an inductive learning approach in which morphological analysis is reformulated as a segmentation task. We report on a number of experiments in which five inductive learning algorithms are applied to three variations of the task of morphological analysis. Results show (i) that the generalisation performance of the algorithms is good, and (ii) that the lazy learning algorithm IB1-IG performs best on all three tasks. We conclude that lazy learning of morphological analysis as a classification task is indeed a viable approach; moreover, it has the strong advantages over the traditional approach of avoiding the knowledge-acquisition bottleneck, being fast and deterministic in learning and processing, and being language-independent.Comment: 11 pages, 5 encapsulated postscript figures, uses non-standard NeMLaP proceedings style nemlap.sty; inputs ipamacs (international phonetic alphabet) and epsf macro

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion

Author: Antal van den Bosch
Walter Daelemans
Publication venue
Publication date: 01/01/1994
Field of study

We report on an implemented grapheme-to-phoneme conversion architecture. Given a set of examples (spelling words with their associated phonetic representation) in a language, a grapheme-to-phoneme conversion system is automatically produced for that language which takes as its input the spelling of words, and produces as its output the phonetic transcription according to the rules implicit in the training data. This paper describes the architecture and focuses on our solution to the alignment problem: given the spelling and the phonetic trancription of a word (often differing in length), these two representations have to be aligned in such a way that grapheme symbols or strings of grapheme symbols are consistently associated with the same phonetic symbol. If this alignment has to be done by hand, it is extremely labour-intensive. 1 Introduction Grapheme-to-phoneme conversion is an essential module in any text-to-speech system. Various language-specific sources of linguistic knowledge ..

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository