Search CORE

34 research outputs found

Automatic grammar rule extraction and ranking for definitions

Author: Borg Claudia
LREC 2010
Pace Gordon J.
Rosner Mike
Publication venue: University of Malta. Faculty of Information and Communication Technology
Publication date: 01/01/2010
Field of study

Learning texts contain much implicit knowledge which is ideally presented to the learner in a structured manner - a typical example being definitions of terms in the text, which would ideally be presented separately as a glossary for easy access. The problem is that manual extraction of such information can be tedious and time consuming. In this paper we describe two experiments carried out to enable the automated extraction of definitions from non-technical learning texts using evolutionary algorithms. A genetic programming approach is used to learn grammatical rules helpful in discriminating between definitions and non-definitions, after which, a genetic algorithm is used to learn the relative importance of these features, thus enabling the ranking of candidate sentences in order of confidence. The results achieved are promising, and we show that it is possible for a Genetic Program to automatically learn similar rules derived by a human linguistic expert and for a Genetic Algorithm to then give a weighted score to those rules so as to rank extracted definitions in order of confidence in an effective manner.peer-reviewe

OAR@UM

Using dialogue corpora to extend information extraction patterns for natural language understanding of dialogue

Author: Catizone Roberta
Dingli Alexiei
Gaizauskas Robert
Language Resources and Evaluation Conference (LREC 2010)
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2010
Field of study

This work was funded by the Companions project (www.companions-project.org) sponsored by the European Commission as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6-034434.This paper examines how Natural Language Process (NLP) resources and online dialogue corpora can be used to extend coverage of Information Extraction (IE) templates in a Spoken Dialogue system. IE templates are used as part of a Natural Language Understanding module for identifying meaning in a user utterance. The use of NLP tools in Dialogue systems is a difficult task given spoken dialogue is often not well-formed and 2) there is a serious lack of dialogue data. In spite of that, we have devised a method for extending IE patterns using standard NLP tools and available dialogue corpora found on the web. In this paper, we explain our method which includes using a set of NLP modules developed using GATE (a General Architecture for Text Engineering), as well as a general purpose editing tool that we built to facilitate the IE rule creation process. Lastly, we present directions for future work in this area.peer-reviewe

OAR@UM

Towards player’s affective and behavioral visual cues as drives to game adaptation

Author: Asteriadis Stylianos
Karpouzis Kostas
LREC Workshop on Multimodal Corpora for Machine Learning (2012)
Shaker Noor
Yannakakis Georgios N.
Publication venue: LREC
Publication date: 01/05/2012
Field of study

Recent advances in emotion and affect recognition can play a crucial role in game technology. Moving from the typical game controls to controls generated from free gestures is already in the market. Higher level controls, however, can also be motivated by player’s affective and cognitive behavior itself, during gameplay. In this paper, we explore player’s behavior, as captured by computer vision techniques, and player’s details regarding his own experience and profile. The objective of the current research is game adaptation aiming at maximizing player enjoyment. To this aim, the ability to infer player engagement and frustration, along with the degree of challenge imposed by the game is explored. The estimated levels of the induced metrics can feed an engine’s artificial intelligence, allowing for game adaptation.This research was supported by the FP7 ICT project SIREN (project no: 258453)peer-reviewe

OAR@UM

Cross-Lingual Zero Pronoun Resolution

Author: Aloraini A
Poesio M
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)
Publication venue: ELRA and the Association for Computational Linguistics
Publication date: 31/05/2020
Field of study

In languages like Arabic, Chinese, Italian, Japanese, Korean, Portuguese, Spanish, and many others, predicate arguments in certainsyntactic positions are not realized instead of being realized as overt pronouns, and are thus called zero- or null-pronouns. Identifyingand resolving such omitted arguments is crucial to machine translation, information extraction and other NLP tasks, but depends heavilyonsemanticcoherenceandlexicalrelationships. WeproposeaBERT-basedcross-lingualmodelforzeropronounresolution,andevaluateit on the Arabic and Chinese portions of OntoNotes 5.0. As far as we know, ours is the first neural model of zero-pronoun resolutionfor Arabic; and our model also outperforms the state-of-the-art for Chinese. In the paper we also evaluate BERT feature extraction andfine-tune models on the task, and compare them with our model. We also report on an investigation of BERT layers indicating whichlayer encodes the most suitable representation for the task. Our code is available at https://github.com/amaloraini/cross-lingual-Z

Queen Mary Research Online

A Cluster Ranking Model for Full Anaphora Resolution

Author: Poesio M
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)
Uma A
Yu J
Publication venue: European Language Resources Association (ELRA)
Publication date: 31/05/2020
Field of study

Anaphora resolution (coreference) systems designed for theCONLL2012 dataset typically cannot handle key aspects of the full anaphoraresolution task such as the identification of singletons and of certain types of non-referring expressions (e.g., expletives), as these aspectsare not annotated in that corpus. However, the recently releasedCRAC2018 Shared Task and Phrase Detectives (PD) datasets can nowbe used for that purpose. In this paper, we introduce an architecture to simultaneously identify non-referring expressions (includingexpletives, predicativeNPs, and other types) and build coreference chains, including singletons. Our cluster-ranking system uses anattention mechanism to determine the relative importance of the mentions in the same cluster. Additional classifiers are used to identifysingletons and non-referring markables. Our contributions are as follows. First of all, we report the first result on theCRACdata usingsystem mentions; our result is 5.8% better than the shared task baseline system, which used gold mentions. Our system also outperformsthe best-reported system onPDby up to 5.3%. Second, we demonstrate that the availability of singleton clusters and non-referringexpressions can lead to substantially improved performance on non-singleton clusters as well. Third, we show that despite our model notbeing designed specifically for theCONLLdata, it achieves a very competitive result

Queen Mary Research Online

Information extraction tools and methods for understanding dialogue in a companion

Author: Catizone Roberta
Dingli Alexiei
Pinto Hugo
The International Conference on Language Resources and Evaluation (LREC 2008)
Wilks Yorick
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2008
Field of study

The authors' research was sponsored by the European Commission under EC grant IST-FP6-034434 (Companions).This paper discusses how Information Extraction is used to understand and manage Dialogue in the EU-funded Companions project. This will be discussed with respect to the Senior Companion, one of two applications under development in the EU-funded Companions project. Over the last few years, research in human-computer dialogue systems has increased and much attention has focused on applying learning methods to improving a key part of any dialogue system, namely the dialogue manager. Since the dialogue manager in all dialogue systems relies heavily on the quality of the semantic interpretation of the user’s utterance, our research in the Companions project, focuses on how to improve the semantic interpretation and combine it with knowledge from the Knowledge Base to increase the performance of the Dialogue Manager. Traditionally the semantic interpretation of a user utterance is handled by a natural language understanding module which embodies a variety of natural language processing techniques, from sentence splitting, to full parsing. In this paper we discuss the use of a variety of NLU processes and in particular Information Extraction as a key part of the NLU module in order to improve performance of the dialogue manager and hence the overall dialogue system.peer-reviewe

OAR@UM

Incorporating an error corpus into a spellchecker for Maltese

Author: 8th International Conference on Language Resources and Evaluation (LREC)
Attard Andrew
Gatt Albert
Joachimsen Jan
Rosner Mike
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2012
Field of study

This paper discusses the ongoing development of a new Maltese spell checker, highlighting the methodologies which would best suit such a language. We thus discuss several previous attempts, highlighting what we believe to be their weakest point: a lack of attention to context. Two developments are of particular interest, both of which concern the availability of language resources relevant to spellchecking: (i) the Maltese Language Resource Server (MLRS) which now includes a representative corpus of c. 100M words extracted from diverse documents including the Maltese Legislation, press releases and extracts from Maltese web-pages and (ii) an extensive and detailed corpus of spelling errors that was collected whilst part of the MLRS texts were being prepared. We describe the structure of these resources as well as the experimental approaches focused on context that we are now in a position to adopt. We describe the framework within which a variety of different approaches to spellchecking and evaluation will be carried out, and briefly discuss the first baseline system we have implemented. We conclude the paper with a roadmap for future improvements.peer-reviewe

OAR@UM

Neural Mention Detection

Author: Bohnet B
Poesio M
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)
Yu J
Publication venue: ELRA and the Association for Computational Linguistics
Publication date: 31/05/2020
Field of study

Mention detection is an important preprocessing step for annotation and interpretation in applications such as NER and coreference resolution, but few stand-alone neural models have been proposed able to handle the full range of mentions. In this work, we propose and compare three neural network-based approaches to mention detection. The first approach is based on the mention detection part of a state of the art coreference resolution system; the second uses ELMO embeddings together with a bidirectional LSTM and a biaffine classifier; the third approach uses the recently introduced BERT model. Our best model (using a biaffine classifier) achieves gains of up to 1.8 percentage points on mention recall when compared with a strong baseline in a HIGH RECALL coreference annotation setting. The same model achieves improvements of up to 5.3 and 6.2 p.p. when compared with the best-reported mention detection F1 on the CONLL and CRAC coreference data sets respectively in a HIGH F1 annotation setting. We then evaluate our models for coreference resolution by using mentions predicted by our best model in start-of-the-art coreference systems. The enhanced model achieved absolute improvements of up to 1.7 and 0.7 p.p. when compared with our strong baseline systems (pipeline system and end-to-end system) respectively. For nested NER, the evaluation of our model on the GENIA corpora shows that our model matches or outperforms state-of-the-art models despite not being specifically designed for this task

Queen Mary Research Online

Classifying the informative behaviour of emoji in microblogs

Author: Donato Giulia
International Conference on Language Resources and Evaluation (LREC 2018)
Paggio Patrizia
Publication venue: European Language Resources Association
Publication date: 01/01/2018
Field of study

Emoji are pictographs commonly used in microblogs as emotion markers, but they can also represent a much wider range of concepts. Additionally, they may occur in different positions within a message (e.g. a tweet), appear in sequences or act as word substitute. Emoji must be considered necessary elements in the analysis and processing of user generated content, since they can either provide fundamental syntactic information, emphasize what is already expressed in the text, or carry meaning that cannot be inferred from the words alone. We collected and annotated a corpus of 2475 tweets pairs with the aim of analyzing and then classifying emoji use with respect to redundancy. The best classification model achieved an F-score of 0.7. In this paper we shortly present the corpus, and we describe the classification experiments, explain the predictive features adopted, discuss the problematic aspects of our approach and suggest future improvements.peer-reviewe

OAR@UM

Creating expert knowledge by relying on language learners : a generic approach for mass-producing language resources by combining implicit crowdsourcing and language learning

Author: 12th edition of the Language Resources and Evaluation Conference (LREC'20)
Aparaschivei Lavina
Barreiro Anabela
Borg Claudia
Cibej Jaka
Forascu Corina
Fort Karen
HaCohen-Kerner Yaakov
Hassan Umair ul
Holdt Spela Arhar
Katinskaia Anisia
Konig Alexander
Kosem Iztok
Lyding Verena
Millour Alice
Nicholas Lionel
Rodosthenous Christos
Sangati Federico
Zdravkova Katerina
Publication venue
Publication date: 01/05/2020
Field of study

We introduce in this paper a generic approach to combine implicit crowdsourcing and language learning in order to mass-produce language resources (LRs) for any language for which a crowd of language learners can be involved. We present the approach by explaining its core paradigm that consists in pairing specific types of LRs with specific exercises, by detailing both its strengths and challenges, and by discussing how much these challenges have been addressed at present. Accordingly, we also report on on-going proof-of-concept efforts aiming at developing the first prototypical implementation of the approach in order to correct and extend an LR called ConceptNet based on the input crowdsourced from language learners. We then present an international network called the European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect) that provides the context to accelerate the implementation of the generic approach. Finally, we exemplify how it can be used in several language learning scenarios to produce a multitude of NLP resources and how it can therefore alleviate the long-standing NLP issue of the lack of LRs.peer-reviewe

OAR@UM