    The Verbmobil semantic database

    The distributed development of the modules of a large natural language processing system at different sites makes interface definitions a vital issue. It becomes even more urgent when several modules with the same intended functionality are developed in parallel and should be indistinguishable with respect to their input—output—behaviour. Another important issue is the acquisition and maintenance of lexical information which should be stored independently of an application to make it (re)usable for different purposes. This paper describes the design and use of the Verbmobil Semantic Database which we developed in order to deal with these issues in the area of lexical semantics in Verbmobil

    DĂ©tecter le potentiel d'ambiguĂŻtĂ© d'une requĂȘte - le cas des recherches portant sur l'actualitĂ©

    International audienceL'objectif du travail que nous prĂ©sentons ici est d'examiner la notion d'ambigĂŒitĂ© Ă  travers l'Ă©tude des requĂȘtes produites dans un systĂšme de RI, le site 2424actu.fr d'Orange, opĂ©rationnel du 1/10/2009 au 1/09/2011. Celui-ci vise le traitement d'une base de documents relatifs Ă  l'actualitĂ© française, domaine particuliĂšrement mouvant et par consĂ©quent propice Ă  l'examen de la question de l'ambiguĂŻtĂ©. Nous cherchons Ă  dĂ©terminer la nature de l'ambiguĂŻtĂ© des requĂȘtes en examinant les logs de requĂȘtes disponibles et en les confrontant Ă  diffĂ©rents indices contextuels qui enrichissent la perception de la variabilitĂ© sĂ©mantique des termes de la requĂȘte

    MaskParse@Deskin at SemEval-2019 Task 1: Cross-lingual UCCA Semantic Parsing using Recursive Masked Sequence Tagging

    International audienceThis paper describes our recursive system for SemEval-2019 \textit{ Task 1: Cross-lingual Semantic Parsing with UCCA}. Each recursive step consists of two parts. We first perform semantic parsing using a sequence tagger to estimate the probabilities of the UCCA categories in the sentence. Then, we apply a decoding policy which interprets these probabilities and builds the graph nodes. Parsing is done recursively, we perform a first inference on the sentence to extract the main scenes and links and then we recursively apply our model on the sentence using a masking feature that reflects the decisions made in previous steps. Process continues until the terminal nodes are reached. We choose a standard neural tagger and we focused on our recursive parsing strategy and on the cross lingual transfer problem to develop a robust model for the French language, using only few training samples

    Knowledge-based semantic annotation and retrieval of multimedia content

    aceMedia is a 4 year EC part-funded FP6 Integrated Project, ending in December 2007. The project has developed tools to enable users to manage and share both personal and purchased content across PC, STB and mobile platforms. Knowledge-based analysis and ontologies have been successfully exploited in an end-to-end system to enable automated semantic annotation and retrieval of multimedia content. The paper briefly describes the objectives of aceMedia and the application of knowledge-based analysis in the project

    CALOR-Frame : un corpus de textes encyclopédiques annoté en cadres sémantiques

    International audienceCALOR-Frame : a corpus of encyclopedic texts annotated with semantic frames CALOR-Frame is a corpus of History encyclopedic texts annotated in semantic frames, that has been jointly produced by Aix-Marseille University and Orange Labs. The constitution of this ressource has been driven by the more general context of Information Retrieval, with the purpose of enhancing access to Knowledge contents. Semantic Frame structuration enables advanced research fucntionalities, beyond keyword search. This article presents the annotation process that has been set up, using a tool to automatically validate generated annotations in an optimized way. The selection of texts and semantic frames is also motivated. MOTS-CLÉS : Cadre sĂ©mantique, corpus, apprentissage actif, Ă©tiquetage de sĂ©quence.Le corpus CALOR-Frame est un corpus annotĂ© en cadres sĂ©mantiques, constituĂ© de textes encyclo-pĂ©diques dans le domaine de l'Histoire et produit conjointement par l'UniversitĂ© d'Aix-Marseille et Orange Labs. La constitution de cette ressource s'inscrit dans le cadre gĂ©nĂ©ral de la recherche d'information avec pour objectif de favoriser l'accĂšs aux contenus de la connaissance. La structuration en cadres sĂ©mantiques permet des recherches avancĂ©es dĂ©passant le cadre de la simple recherche par mots-clĂ©s. Dans cet article est dĂ©crit le processus d'annotation en cadres sĂ©mantiques mis en place, qui utilise un outil de validation d'annotations automatiques Ă  des fins d'optimisation. Le choix des textes et des cadres sĂ©mantiques considĂ©rĂ©s est Ă©galement motivĂ©

    CALOR-QUEST : un corpus d'entraßnement et d'évaluation pour la compréhension automatique de textes

    International audienceLa comprĂ©hension automatique de texte est une tĂąche faisant partie de la famille des systĂšmes de Question/RĂ©ponse oĂč les questions ne sont pas Ă  portĂ©e gĂ©nĂ©rale mais sont liĂ©es Ă  un document particulier. RĂ©cemment de trĂšs grand corpus (SQuAD, MS MARCO) contenant des triplets (document, question, rĂ©ponse) ont Ă©tĂ© mis Ă  la disposition de la communautĂ© scientifique afin de dĂ©velopper des mĂ©thodes supervisĂ©es Ă  base de rĂ©seaux de neurones profonds en obtenant des rĂ©sultats prometteurs. Ces mĂ©thodes sont cependant trĂšs gourmandes en donnĂ©es d'apprentissage, donnĂ©es qui n'existent pour le moment que pour la langue anglaise. Le but de cette Ă©tude est de permettre le dĂ©veloppement de telles ressources pour d'autres langue Ă  moindre coĂ»t en proposant une mĂ©thode gĂ©nĂ©rant des questions Ă  partir d'une analyse sĂ©mantique de maniĂšre semi-automatique. La collecte de questions naturelle est rĂ©duite Ă  un ensemble de validation/test. L'application de cette mĂ©thode sur le corpus CALOR-Frame a permis de dĂ©velopper la ressource CALOR-QUEST prĂ©sentĂ©e dans cet article. ABSTRACT Machine reading comprehension is a task related to the Question-Answering task where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient, however such kind of data only exists for English at the moment. The purpose of this study is the development of such resources for other languages by proposing a method generating questions from a semantic frame analysis in a semi-automatic way. The collect of natural questions is reduced to a validation/test set. We applied this method on the French CALOR-Frame corpus in order to develop the CALOR-QUEST resource presented in this paper. MOTS-CLÉS : ComprĂ©hension automatique de texte, Question RĂ©ponse, Analyse en cadre sĂ©man-tique, GĂ©nĂ©ration de questions

    Klimaschutz in finanzschwachen Kommunen

