Search CORE

955 research outputs found

Do not forget: Full memory in memory-based learning of word pronunciation

Author: Bosch Antal van den
Daelemans Walter
Publication venue
Publication date: 01/01/1998
Field of study

Memory-based learning, keeping full memory of learning material, appears a viable approach to learning NLP tasks, and is often superior in generalisation accuracy to eager learning approaches that abstract from learning material. Here we investigate three partial memory-based learning approaches which remove from memory specific task instance types estimated to be exceptional. The three approaches each implement one heuristic function for estimating exceptionality of instance types: (i) typicality, (ii) class prediction strength, and (iii) friendly-neighbourhood size. Experiments are performed with the memory-based learning algorithm IB1-IG trained on English word pronunciation. We find that removing instance types with low prediction strength (ii) is the only tested method which does not seriously harm generalisation accuracy. We conclude that keeping full memory of types rather than tokens, and excluding minority ambiguities appear to be the only performance-preserving optimisations of memory-based learning.Comment: uses conll98, epsf, and ipamacs (WSU IPA

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Morphological Analysis as Classification: an Inductive-Learning Approach

Author: Bosch Antal van den
Daelemans Walter
Weijters Ton
Publication venue
Publication date: 01/01/1996
Field of study

Morphological analysis is an important subtask in text-to-speech conversion, hyphenation, and other language engineering tasks. The traditional approach to performing morphological analysis is to combine a morpheme lexicon, sets of (linguistic) rules, and heuristics to find a most probable analysis. In contrast we present an inductive learning approach in which morphological analysis is reformulated as a segmentation task. We report on a number of experiments in which five inductive learning algorithms are applied to three variations of the task of morphological analysis. Results show (i) that the generalisation performance of the algorithms is good, and (ii) that the lazy learning algorithm IB1-IG performs best on all three tasks. We conclude that lazy learning of morphological analysis as a classification task is indeed a viable approach; moreover, it has the strong advantages over the traditional approach of avoiding the knowledge-acquisition bottleneck, being fast and deterministic in learning and processing, and being language-independent.Comment: 11 pages, 5 encapsulated postscript figures, uses non-standard NeMLaP proceedings style nemlap.sty; inputs ipamacs (international phonetic alphabet) and epsf macro

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Forgetting Exceptions is Harmful in Language Learning

Author: Bosch Antal van den
Daelemans Walter
Zavrel Jakub
Publication venue
Publication date: 22/12/1998
Field of study

We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.Comment: 31 pages, 7 figures, 10 tables. uses 11pt, fullname, a4wide tex styles. Pre-print version of article to appear in Machine Learning 11:1-3, Special Issue on Natural Language Learning. Figures on page 22 slightly compressed to avoid page overloa

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Dutch word sense disambiguation: optimizing the localness of context

Author: Daelemans Walter
Hendrickx Iris
Hoste Veronique
van den Bosch Antal
Publication venue: Association of Computer Linguistics (ACL)
Publication date: 01/01/2002
Field of study

Ghent University Academic Bibliography

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Non-archimedean canonical measures on abelian varieties

Author: Berkovich
Berkovich
Bombieri
Bosch
Bosch
Bourbaki
Chambert-Loir
Faltings
Fulton
Grothendieck
Gubler
Gubler
Gubler
Hartl
Mumford
Walter Gubler
Publication venue: 'Wiley'
Publication date: 09/12/2009
Field of study

For a closed d-dimensional subvariety X of an abelian variety A and a canonically metrized line bundle L on A, Chambert-Loir has introduced measures

c_1(L|_X)^{\wedge d}

on the Berkovich analytic space associated to A with respect to the discrete valuation of the ground field. In this paper, we give an explicit description of these canonical measures in terms of convex geometry. We use a generalization of the tropicalization related to the Raynaud extension of A and Mumford's construction. The results have applications to the equidistribution of small points.Comment: Thorough revision according to the comments of the referee. To appear in Compositi

arXiv.org e-Print Archive

Crossref

Hydrogel spacer distribution within the perirectal space in patients undergoing radiotherapy for prostate cancer: Impact of spacer symmetry on rectal dose reduction and the clinical consequences of hydrogel infiltration into the rectal wall

Author: Bosch Walter
Chundury Anupama
Fischer-Valuck Benjamin W
Gay Hiram
Michalski Jeff
Publication venue: Digital Commons@Becker
Publication date: 01/01/2017
Field of study

Digital Commons@Becker

GAMBL, genetic algorithm optimization of memory-based WSD

Author: Antal Van Den Bosch
Bart Decadt
Bart Decadt And
Véronique Hoste
Walter Daelemans
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2004
Field of study

GAMBL is a word expert approach to WSD in which each word expert is trained using memory based learning. Joint feature selection and algorithm parameter optimization are achieved with a genetic algorithm (GA). We use a cascaded classifier approach in which the GA optimizes local context features and the output of a separate keyword classifier (rather than also optimizing the keyword features together with the local context features). A further innovation on earlier versions of memory based WSD is the use of grammatical relation and chunk features. This paper presents the architecture of the system briefly, and discusses its performance on the English lexical sample and all words tasks in SENSEVAL-3

CiteSeerX

Ghent University Academic Bibliography

Tilburg University Repository