Search CORE

84 research outputs found

Automatic grammar rule extraction and ranking for definitions

Author: Borg Claudia
LREC 2010
Pace Gordon J.
Rosner Mike
Publication venue: University of Malta. Faculty of Information and Communication Technology
Publication date: 01/01/2010
Field of study

Learning texts contain much implicit knowledge which is ideally presented to the learner in a structured manner - a typical example being definitions of terms in the text, which would ideally be presented separately as a glossary for easy access. The problem is that manual extraction of such information can be tedious and time consuming. In this paper we describe two experiments carried out to enable the automated extraction of definitions from non-technical learning texts using evolutionary algorithms. A genetic programming approach is used to learn grammatical rules helpful in discriminating between definitions and non-definitions, after which, a genetic algorithm is used to learn the relative importance of these features, thus enabling the ranking of candidate sentences in order of confidence. The results achieved are promising, and we show that it is possible for a Genetic Program to automatically learn similar rules derived by a human linguistic expert and for a Genetic Algorithm to then give a weighted score to those rules so as to rank extracted definitions in order of confidence in an effective manner.peer-reviewe

OAR@UM

Language technologies for an eLearning scenario

Author: 5th Computer Science Annual Workshop (CSAW’07)
Borg Claudia
Rosner Mike
Publication venue: University of Malta. Faculty of ICT
Publication date: 01/01/2007
Field of study

One of the problems with eLearning platforms when collating together documents from different resources is the retrieval of documents and their accessibility. By providing documents with additional metadata using Language Technologies one enables users to access information more effectively. In this paper we present an overview of the objectives and results achieved for the LT4eL Project, which aims at providing Language Technologies to eLearning platforms and to integrate semantic knowledge to facilitate the management, distribution and retrieval of the learning material.peer-reviewe

OAR@UM

Definition characterisation through genetic algorithms

Author: Borg Claudia
First National ICT Conference
Pace Gordon J.
Rosner Mike
Publication venue: University of Malta. Faculty of Information and Communication Technology
Publication date: 01/01/2008
Field of study

The identification of definitions from natural language texts is useful in learning environments, for glossary creation and question answering systems. It is a tedious task to extract such definitions manually, and several techniques have been proposed for automatic definition identification in these domains, including rule-based and statistical methods. These techniques usually rely on linguistic expertise to identify grammatical and word patterns which characterize definitions. In this paper, we look at the use of machine learning techniques, in particular genetic algorithms, to enable the automatic extraction of definitions. Genetic algorithms are used to determine the relative importance of a set of linguistic features which can be present or absent in definitional sentences as a set of numerical weights. These weights provide an importance measure to the set of features. In this work we report on the results of various experiments carried out and evaluate them on an eLearning corpus. We also propose a way forward for discovering such features automatically through genetic programming and suggest how these two techniques can be used together for definition extraction.peer-reviewe

OAR@UM

Evolutionary algorithms for definition extraction

Author: 1st Workshop on Definition Extraction
Borg Claudia
Pace Gordon J.
Rosner Mike
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Books and other text-based learning material contain implicit information which can aid the learner but which usually can only be accessed through a semantic analysis of the text. Definitions of new concepts appearing in the text are one such instance. If extracted and presented to the learner in form of a glossary, they can provide an excellent reference for the study of the main text. One way of extracting definitions is by reading through the text and annotating definitions manually — a tedious and boring job. In this paper, we explore the use of machine learning to extract definitions from non-technical texts, reducing human expert input to a minimum. We report on experiments we have conducted on the use of genetic programming to learn the typical linguistic forms of definitions and a genetic algorithm to learn the relative importance of these forms. Results are very positive, showing the feasibility of exploring further the use of these techniques in definition extraction. The genetic program is able to learn similar rules derived by a human linguistic expert, and the genetic algorithm is able to rank candidate definitions in an order of confidence.peer-reviewe

OAR@UM

Phrase extraction for machine translation

Author: 5th Computer Science Annual Workshop (CSAW’07)
Bajada Jo-Ann
Rosner Mike
Publication venue: University of Malta. Faculty of ICT
Publication date: 01/01/2007
Field of study

Statistical Machine Translation (SMT) developed in the late 1980s, based initially upon a word-to-word translation process. However, such processes have difficulties when good quality translation is not strictly word-to-word. Easy cases can be handled by allowing insertion and deletion of single words, but for more general word reordering phenomena, a more general translation process is required. There is currently much interest in phrase-to-phrase models, which can overcome this problem, but require that candidate phrases, together with their translations, be identified in the training corpora. Since phrase delimiters are not explicit, this gives rise to a new problem; that of phrase pair extraction. The current project proposes a phrase extraction algorithm which uses a window of n words around source and target words to extract equivalent phrases. The extracted phrases together with their probabilities, are used as input to an existing Machine Translation system for the purpose of evaluating the phrase extraction algorithm.peer-reviewe

OAR@UM

Towards automatic extraction of definitions

Author: 5th Computer Science Annual Workshop (CSAW’07)
Borg Claudia
Pace Gordon J.
Rosner Mike
Publication venue: University of Malta. Faculty of ICT
Publication date: 01/01/2007
Field of study

Definition extraction can be useful for the creation of glossaries and in question answering systems. It is a tedious task to extract such sentences manually, and thus an automatic system is desirable. In this work we review various attempts at rule-based approaches reported in the literature and discuss their results. We also propose a novel experiment involving the use of genetic programming and genetic algorithms, aimed at assisting the discovery of grammar rules which can be used for the task of definition extraction.peer-reviewe

OAR@UM

Incorporating an error corpus into a spellchecker for Maltese

Author: 8th International Conference on Language Resources and Evaluation (LREC)
Attard Andrew
Gatt Albert
Joachimsen Jan
Rosner Mike
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2012
Field of study

This paper discusses the ongoing development of a new Maltese spell checker, highlighting the methodologies which would best suit such a language. We thus discuss several previous attempts, highlighting what we believe to be their weakest point: a lack of attention to context. Two developments are of particular interest, both of which concern the availability of language resources relevant to spellchecking: (i) the Maltese Language Resource Server (MLRS) which now includes a representative corpus of c. 100M words extracted from diverse documents including the Maltese Legislation, press releases and extracts from Maltese web-pages and (ii) an extensive and detailed corpus of spelling errors that was collected whilst part of the MLRS texts were being prepared. We describe the structure of these resources as well as the experimental approaches focused on context that we are now in a position to adopt. We describe the framework within which a variety of different approaches to spellchecking and evaluation will be carried out, and briefly discuss the first baseline system we have implemented. We conclude the paper with a roadmap for future improvements.peer-reviewe

OAR@UM

The strategic impact of META-NET on the regional, national and international level

Author: Ananiadou Sophia
Branco Antonio
Hajic Jan
Hernáez Inma
Mariani Joseph
McNaught John
Melero Maite
Monachini Monica
Moreno Bilbao M. Asunción
Odijk Jan
Piperidis Stelios
Rosner Mike
Skadina Inguna
Tadic Marko
Thompson Paul
Tufis Dan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This article provides an overview of the dissemination work carried out in META-NET from 2010 until 2015; we describe its impact on the regional, national and international level, mainly with regard to politics and the funding situation for LT topics. The article documents the initiative's work throughout Europe in order to boost progress and innovation in our field.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Sustainability strategy and plans beyond the end of the project

Author: Ananiadou Sophia
Bel Nuria
Branco Antonio
Cristea Dan
Gilmenau Georgiana
Mendes Amalia
Moreno Bilbao M. Asunción
Pellegrini Thomas
Rosner Mike
Thompson Paul
Trandaba¿ Diana
Tufis Dan
Publication venue
Publication date: 01/01/2013
Field of study

The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Preprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC