The identification of definitions from natural language texts is useful in learning environments, for glossary creation and question answering systems. It is a tedious task to extract such definitions manually, and several techniques have been proposed for automatic definition identification in these domains, including rule-based and statistical methods. These techniques usually rely on linguistic expertise to identify grammatical and word patterns which characterize definitions. In this paper, we look at the use of machine learning techniques, in particular genetic algorithms, to enable the automatic extraction of definitions. Genetic algorithms are used to determine the relative importance of a set of linguistic features which can be present or absent in definitional sentences as a set of numerical weights. These weights provide an importance measure to the set of features. In this work we report on the results of various experiments carried out and evaluate them on an eLearning corpus. We also propose a way forward for discovering such features automatically through genetic programming and suggest how these two techniques can be used together for definition extraction
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.