625 research outputs found
Classification of dual language audio-visual content: Introduction to the VideoCLEF 2008 pilot benchmark evaluation task
VideoCLEF is a new track for the CLEF 2008 campaign. This
track aims to develop and evaluate tasks in analyzing multilingual video content. A pilot of a Vid2RSS task involving assigning thematic class labels to video kicks off the VideoCLEF track in 2008. Task participants deliver classification results in the form of a series of feeds, one for each thematic class. The data for the task are dual language television documentaries. Dutch is the dominant language and English-language content (mostly interviews) is embedded. Participants are provided with speech recognition transcripts of the data in both Dutch and English, and also with metadata generated by archivists. In addition to the classification task, participants can choose to participate in a translation task (translating the
feed into a language of their choice) and a keyframe selection task (choosing a semantically appropriate keyframe for depiction of the videos in the feed)
Cross-Language Question Re-Ranking
We study how to find relevant questions in community forums when the language
of the new questions is different from that of the existing questions in the
forum. In particular, we explore the Arabic-English language pair. We compare a
kernel-based system with a feed-forward neural network in a scenario where a
large parallel corpus is available for training a machine translation system,
bilingual dictionaries, and cross-language word embeddings. We observe that
both approaches degrade the performance of the system when working on the
translated text, especially the kernel-based system, which depends heavily on a
syntactic kernel. We address this issue using a cross-language tree kernel,
which compares the original Arabic tree to the English trees of the related
questions. We show that this kernel almost closes the performance gap with
respect to the monolingual system. On the neural network side, we use the
parallel corpus to train cross-language embeddings, which we then use to
represent the Arabic input and the English related questions in the same space.
The results also improve to close to those of the monolingual neural network.
Overall, the kernel system shows a better performance compared to the neural
network in all cases.Comment: SIGIR-2017; Community Question Answering; Cross-language Approaches;
Question Retrieval; Kernel-based Methods; Neural Networks; Distributed
Representation
Dublin City University at QA@CLEF 2008
We describe our participation in Multilingual Question Answering at CLEF 2008 using German and English as our source and target languages respectively. The system was built using UIMA (Unstructured Information Management Architecture) as underlying framework
Cross-language Information Retrieval
Two key assumptions shape the usual view of ranked retrieval: (1) that the
searcher can choose words for their query that might appear in the documents
that they wish to see, and (2) that ranking retrieved documents will suffice
because the searcher will be able to recognize those which they wished to find.
When the documents to be searched are in a language not known by the searcher,
neither assumption is true. In such cases, Cross-Language Information Retrieval
(CLIR) is needed. This chapter reviews the state of the art for CLIR and
outlines some open research questions.Comment: 49 pages, 0 figure
Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon
This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for diļ¬erent languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are
extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which aļ¬ects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The diļ¬erent steps of the procedure (mapping, disambiguation, extraction, NE identiļ¬cation and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the systemās accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented
An evaluation resource for geographic information retrieval
In this paper we present an evaluation resource for geographic information retrieval developed within the Cross Language Evaluation
Forum (CLEF). The GeoCLEF track is dedicated to the evaluation of geographic information retrieval systems. The resource
encompasses more than 600,000 documents, 75 topics so far, and more than 100,000 relevance judgments for these topics. Geographic
information retrieval requires an evaluation resource which represents realistic information needs and which is geographically
challenging. Some experimental results and analysis are reported
Spanish question answering evaluation
This paper reports the most significant issues related to the launching of a Monolingual Spanish Question Answering evaluation track at the Cross Language Evaluation Forum (CLEF 2003). It introduces some questions about multilingualism and describes the methodology for test suite production, task, judgment of answers as well as the results obtained by the participant systems
- ā¦