84 research outputs found
Automatic grammar rule extraction and ranking for definitions
Learning texts contain much implicit knowledge which is ideally presented to the learner in a structured manner - a
typical example being definitions of terms in the text, which would ideally be presented separately as a glossary for
easy access. The problem is that manual extraction of such information can be tedious and time consuming. In this
paper we describe two experiments carried out to enable the automated extraction of definitions from non-technical
learning texts using evolutionary algorithms. A genetic programming approach is used to learn grammatical rules
helpful in discriminating between definitions and non-definitions, after which, a genetic algorithm is used to learn the
relative importance of these features, thus enabling the ranking of candidate sentences in order of confidence. The
results achieved are promising, and we show that it is possible for a Genetic Program to automatically learn similar
rules derived by a human linguistic expert and for a Genetic Algorithm to then give a weighted score to those rules so
as to rank extracted definitions in order of confidence in an effective manner.peer-reviewe
Language technologies for an eLearning scenario
One of the problems with eLearning platforms when collating together documents from different resources is the retrieval of documents and their accessibility. By providing documents with additional metadata using Language Technologies one enables users to access information more effectively. In this paper we present an overview of the objectives and results achieved for the LT4eL Project, which aims at providing Language Technologies to eLearning platforms and to integrate semantic knowledge to facilitate the management, distribution and retrieval of the learning material.peer-reviewe
Definition characterisation through genetic algorithms
The identification of definitions from natural language texts is useful in learning environments, for glossary creation and question answering systems. It is a tedious task to extract such definitions manually, and several techniques have been proposed for automatic definition identification in these domains, including rule-based and statistical methods. These techniques usually rely on linguistic expertise to identify grammatical and word patterns which characterize definitions. In this paper, we look at the use of machine learning techniques, in particular genetic algorithms, to enable the automatic extraction of definitions. Genetic algorithms are used to determine the relative importance of a set of linguistic features which can be present or absent in definitional sentences as a set of numerical weights. These weights provide an importance measure to the set of features. In this work we report on the results of various experiments carried out and evaluate them on an eLearning corpus. We also propose a way forward for discovering such features automatically through genetic programming and suggest how these two techniques can be used together for definition extraction.peer-reviewe
Evolutionary algorithms for definition extraction
Books and other text-based learning material
contain implicit information which can aid the
learner but which usually can only be accessed
through a semantic analysis of the text. Definitions of new concepts appearing in the text are
one such instance. If extracted and presented
to the learner in form of a glossary, they can
provide an excellent reference for the study of
the main text. One way of extracting definitions is by reading through the text and annotating definitions manually — a tedious and boring
job. In this paper, we explore the use of machine learning to extract definitions from non-technical texts, reducing human expert input to
a minimum. We report on experiments we have
conducted on the use of genetic programming to
learn the typical linguistic forms of definitions
and a genetic algorithm to learn the relative importance of these forms. Results are very positive, showing the feasibility of exploring further
the use of these techniques in definition extraction. The genetic program is able to learn similar
rules derived by a human linguistic expert, and
the genetic algorithm is able to rank candidate
definitions in an order of confidence.peer-reviewe
Phrase extraction for machine translation
Statistical Machine Translation (SMT) developed in the late 1980s, based initially upon a word-to-word translation process. However, such processes have difficulties when good quality translation is not strictly word-to-word. Easy cases can be handled by allowing insertion and deletion of single words, but for more general word reordering phenomena, a more general translation process is required. There is currently much interest in phrase-to-phrase models, which can overcome this problem, but require that candidate phrases, together with their translations, be identified in the training corpora. Since phrase delimiters are not explicit, this gives rise to a new problem; that of phrase pair extraction. The current project proposes a phrase extraction algorithm which uses a window of n words around source and target words to extract equivalent phrases. The extracted phrases together with their probabilities, are used as input to an existing Machine Translation system for the purpose of evaluating the phrase extraction algorithm.peer-reviewe
Towards automatic extraction of definitions
Definition extraction can be useful for the creation of glossaries and in question answering systems. It is a tedious task to extract such sentences manually, and thus an automatic system is desirable. In this work we review various attempts at rule-based approaches reported in the literature and discuss their results. We also propose a novel experiment involving the use of genetic programming and genetic algorithms, aimed at assisting the discovery of grammar rules which can be used for the task of definition extraction.peer-reviewe
Incorporating an error corpus into a spellchecker for Maltese
This paper discusses the ongoing development of a new Maltese spell checker, highlighting the methodologies which would best
suit such a language. We thus discuss several previous attempts, highlighting what we believe to be their weakest point: a lack of
attention to context. Two developments are of particular interest, both of which concern the availability of language resources relevant
to spellchecking: (i) the Maltese Language Resource Server (MLRS) which now includes a representative corpus of c. 100M words
extracted from diverse documents including the Maltese Legislation, press releases and extracts from Maltese web-pages and (ii) an
extensive and detailed corpus of spelling errors that was collected whilst part of the MLRS texts were being prepared. We describe
the structure of these resources as well as the experimental approaches focused on context that we are now in a position to adopt. We
describe the framework within which a variety of different approaches to spellchecking and evaluation will be carried out, and briefly
discuss the first baseline system we have implemented. We conclude the paper with a roadmap for future improvements.peer-reviewe
The strategic impact of META-NET on the regional, national and international level
This article provides an overview of the dissemination work carried out in META-NET from 2010 until 2015; we describe its impact on the regional, national and international level, mainly with regard to politics and the funding situation for LT topics. The article documents the initiative's work throughout Europe in order to boost progress and innovation in our field.Peer ReviewedPostprint (author's final draft
Sustainability strategy and plans beyond the end of the project
The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Preprin
- …