Search CORE

16 research outputs found

Linguistic Resources and Technologies for Romanian Language

Author: Corina Forascu
Dan Cristea
Publication venue: Vladimir Andrunachievici Institute of Mathematics and Computer Science
Publication date: 01/05/2006
Field of study

This paper revises notions related to Language Resources and Technologies (LRT), including a brief overview of some resources developed worldwide and with a special focus on Romanian language. It then describes a joined Romanian, Moldavian, English initiative aimed at developing electronically coded resources for Romanian language, tools for their maintenance and usage, as well as for the creation of applications based on these resources

Directory of Open Access Journals

Question Answering over Linked Data (QALD-4)

Author: Cabrio Elena
Cimiano Philipp
Forascu Corina
Lopez Vanessa
Ngonga Ngomo Axel-Cyrille
Unger Christina
Walter Sebastian
Publication venue: HAL CCSD
Publication date: 15/09/2014
Field of study

International audienceWith the increasing amount of semantic data available on the web there is a strong need for systems that allow common web users to access this body of knowledge. Especially question answering systems have received wide attention, as they allow users to express arbitrarily complex information needs in an easy and intuitive fashion (for an overview see [4]). The key challenge lies in translating the users' information needs into a form such that they can be evaluated using standard Semantic Web query processing and inferencing techniques. Over the past years, a range of approaches have been developed to address this challenge, showing signicant advances towards answering natural language questions with respect to large, heterogeneous sets of structured data. However, only few systems yet address the fact that the structured data available nowadays is distributed among a large collection of interconnected datasets, and that answers to questions can often only be provided if information from several sources are combined. In addition, a lot of information is still available only in textual form, both on the web and in the form of labels and abstracts in linked data sources. Therefore approaches are needed that can not only deal with the specific character of structured data but also with finding information in several sources, processing both structured and unstructured information, and combining such gathered information into one answer. The main objective of the open challenge on question answering over linked data (QALD) is to provide up-to-date, demanding benchmarks that establish a standard against which question answering systems over structured data can be evaluated and compared. QALD-4 is the fourth instalment of the QALD open challenge, comprising three tasks: multilingual question answering, biomedical question answering over interlinked data, and hybrid question answering

HAL-UNICE

INRIA a CCSD electronic archive server

Overview of the CLEF 2008 Multilingual Question Answering Track

Author: Alegria Iñaki
Forascu Corina
Forner Pamela
Moreau Nicolas
Osenova Petya
Peñas Anselmo
Prokopidis Prokopis
Rocha Paulo
Sacaleanu Bogdan
Sang Erik Tjong Kim
Sutcliffe Richard
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2009
Field of study

Crossref

Repositório Comum

Creating expert knowledge by relying on language learners : a generic approach for mass-producing language resources by combining implicit crowdsourcing and language learning

Author: 12th edition of the Language Resources and Evaluation Conference (LREC'20)
Aparaschivei Lavina
Barreiro Anabela
Borg Claudia
Cibej Jaka
Forascu Corina
Fort Karen
HaCohen-Kerner Yaakov
Hassan Umair ul
Holdt Spela Arhar
Katinskaia Anisia
Konig Alexander
Kosem Iztok
Lyding Verena
Millour Alice
Nicholas Lionel
Rodosthenous Christos
Sangati Federico
Zdravkova Katerina
Publication venue
Publication date: 01/05/2020
Field of study

We introduce in this paper a generic approach to combine implicit crowdsourcing and language learning in order to mass-produce language resources (LRs) for any language for which a crowd of language learners can be involved. We present the approach by explaining its core paradigm that consists in pairing specific types of LRs with specific exercises, by detailing both its strengths and challenges, and by discussing how much these challenges have been addressed at present. Accordingly, we also report on on-going proof-of-concept efforts aiming at developing the first prototypical implementation of the approach in order to correct and extend an LR called ConceptNet based on the input crowdsourced from language learners. We then present an international network called the European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect) that provides the context to accelerate the implementation of the generic approach. Finally, we exemplify how it can be used in several language learning scenarios to produce a multitude of NLP resources and how it can therefore alleviate the long-standing NLP issue of the lack of LRs.peer-reviewe

OAR@UM

Toward a truly multilingual GlobalWordNet

Author: Bond Francis
Fellbaum Christiane
Forascu Corina
Forascu Corina
McCrae John P.
Mititelu Verginica Barbu
Vossen Piek
Vossen Piek
Publication venue
Publication date: 01/01/2016
Field of study

In this paper, we describe a new and improved GlobalWordnet Grid that takes advantage of the Collaborative InterLingual Index (CILI). Currently, the Open Multilingal Wordnet has made many wordnets accessible as a single linked wordnet, but as it used the Princeton Wordnet of English (PWN) as a pivot, it loses concepts that are not part of PWN. The technical solution to this, a central registry of concepts, as proposed in the EuroWordnet project through the InterLingual Index, has been known for many years. However, the practical issues of how to host this index and who decides what goes in remained unsolved. Inspired by current practice in the Semantic Web and the Linked Open Data community, we propose a way to solve this issue. In this paper we define the principles and protocols for contributing to the Grid. We tested them on two use cases, adding version 3.1 of the Princeton WordNet to a CILI based on 3.0 and adding the Open Dutch Wordnet, to validate the current set up. This paper aims to be a call for action that we hope will be further discussed and ultimately taken up by the whole wordnet community

Multi-document multilingual summarization corpus preparation, Part 1: Arabic, English, Greek, Chinese, Romanian

Author: Corina Forascu
George Giannakopoulos
Lei Li
Mahmoud El-haj
Publication venue
Publication date: 01/08/2013
Field of study

This document overviews the strategy, effort and aftermath of the MultiLing 2013 multilingual summarization data collection. We describe how the Data Contributors of MultiLing collected and generated a multilingual multi-document summarization corpus on 10 different languages

CiteSeerX

Lancaster E-Prints

Analysing the Human Processing of Verbal Humour through Eye−Tracking Experiments

Author: Corina Forascu
Dan Tufis
Kim Plunkett
Rada Mihalcea
Stephen Pulman
Vanya Kovic
Publication venue
Publication date: 01/01/2010
Field of study

Overview of the clef 2007 multilingual question answering track

Author: Anselmo Peñas
Bogdan Sacaleanu
Christelle Ayache
Corina Forascu
Danilo Giampiccolo
Jesús Herrera
Pamela Forner
Paulo Rocha
Petya Osenova
Richard Sutcliffe
Valentin Jijkoun
Publication venue
Publication date: 01/01/2007
Field of study

Abstract The fifth QA campaign at CLEF [1], having its first edition in 2003, offered not only a main task but an Answer Validation Exercise (AVE) [2], which continued last year’s pilot, and a new pilot: the Question Answering on Speech Transcripts (QAST) [3, 15]. The main task was characterized by the focus on cross-linguality, while covering as many European languages as possible. As novelty, some QA pairs were grouped in clusters. Every cluster was characterized by a topic (not given to participants). The questions from a cluster possibly contain co-references between one of them and the others. Finally, the need for searching answers in web formats was satisfied by introducing Wikipedia 1 as document corpus. The results and the analyses reported by the participants suggest that the introduction of Wikipedia and the topic related questions led to a drop in systems’ performance.

CiteSeerX

Repositório Comum