Search CORE

1,176 research outputs found

Establishing a New State-of-the-Art for French Named Entity Recognition

Author: Dupont Yoann
Muller Benjamin
Romary Laurent
Sagot Benoît
Suárez Pedro Javier Ortiz
Publication venue
Publication date: 11/05/2020
Field of study

The French TreeBank developed at the University Paris 7 is the main source of morphosyntactic and syntactic annotations for French. However, it does not include explicit information related to named entities, which are among the most useful information for several natural language processing tasks and applications. Moreover, no large-scale French corpus with named entity annotations contain referential information, which complement the type and the span of each mention with an indication of the entity it refers to. We have manually annotated the French TreeBank with such information, after an automatic pre-annotation step. We sketch the underlying annotation guidelines and we provide a few figures about the resulting annotations

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1

Construction et enrichissement automatique d'ontologie à partir de ressources externes

Author: Aussenac-Gilles Nathalie
Bessagnet Marie-Noelle
Gaio Mauro
Kamel Mouna
Kergosien Eric
Sallaberry Christian
Publication venue: HAL CCSD
Publication date: 03/12/2009
Field of study

National audienceAutomatic construction of ontologies from text is generally based on retrieving text content. For a much more rich ontology we extend these approaches by taking into account the document structure and some external resources (like thesaurus of indexing terms of near domain). In this paper we describe how these external resources are at first analyzed and then exploited. This method has been applied on a geographical domain and the benefit has been evaluated

HAL - Lille 3

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL Descartes

Une expérience d'annotation à large échelle : le projet OTIM

Author: Bertrand Roxane
Bigi Brigitte
Blache Philippe
Espesser Robert
Guardiola Mathilde
Rauzy Stéphane
Publication venue: HAL CCSD
Publication date: 01/04/2011
Field of study

Nous proposons dans cette présentation de faire le point sur une opération dannotation de grande envergure conduite dans le cadre du projet OTIM. Nous avons dans le cadre de ce projet constitué un grand corpus audio-visuel de parole spontanée comprenant 8 heures de dialogues (soit 102.457 mots correspondant à 6.611 formes différentes) totalement transcrit, aligné et richement annoté pour lensemble des domaines et des modalités. Nous avons donc été confrontés aux principaux problèmes posés par lannotation de ce type de ressource. Cette présentation décrit les recommandations et les techniques que nous avons utilisées pour parvenir à nos fins

HAL AMU

Contribuer au progrès solidaire des recherches et de la documentation : la Collection Pangloss et la Collection AuCo

Author: Deo Matthew
Guillaume Séverine
Jacobson Michel
Jacques Guillaume
Michaud Alexis
Mạc Đăng-Khoa
Phạm Thu-Hà
Publication venue: HAL CCSD
Publication date: 04/07/2016
Field of study

International audienceThis talk sets out the scientific goals and achievements of two collections hosted by the Cocoon Open Archive of oral resources: the Pangloss Collection, which mainly focuses on unwritten languages from all areas in the world ; and the AuCo Collection, which is dedicated to languages of Vietnam and neighbouring countries. The aim is to contribute to joint progress in language documentation and in research. Emphasis is placed on the perspectives for phonetic/phonological research that are opened by some recent achievements in the framework of these two Collections.La présente communication présente les projets scientifiques et les réalisations de deux collections hébergées par la plateforme de ressources orales Cocoon : la Collection Pangloss, qui concerne principalement des langues de tradition orale (sans écriture), du monde entier ; et la Collection AuCo, dédiée aux langues du Vietnam et de pays voisins. L'objectif est un progrès solidaire des recherches et de la documentation linguistique. L'accent est mis sur les perspectives ouvertes pour la recherche en phonétique/phonologie par certaines réalisations récentes dans le cadre de ces deux Collections

Hal - Université Grenoble Alpes

HAL Université de Tours

Construction automatique d'ontologies à partir de spécifications de bases de données

Author: Aussenac-Gilles Nathalie
Kamel Mouna
Publication venue: HAL CCSD
Publication date: 26/05/2009
Field of study

National audienceLes méthodes classiques de construction automatiques d'ontologies à partir de textes exploitent le texte proprement dit. Nous étendons ces approches en prenant en compte la structure du texte, élément porteur d'information. Pour cela, nous nous basons sur des documents de spécifications de bases de données au format XML, pour lesquels le découpage structurel du texte correspond à une caractérisation sémantique de son contenu. L'idée est de tirer profit à la fois de la structure du texte et du texte rédigé. La méthode proposée consiste à utiliser la sémantique des balises et à caractériser leurs relations pour définir des règles de création de concepts et de relations sémantiques. Un noyau d'ontologie a été ainsi construit automatiquement à l'aide de ces règles, noyau ensuite enrichi par l'exploitation du texte en langage naturel à l'aide de patrons lexico-syntaxiques définis. Règles et patrons ont été implémentés sous Gate

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Construction automatique d'un large corpus libre annoté morpho-syntaxiquement en français

Author: Boudin Florian
Hernandez Nicolas
Publication venue: HAL CCSD
Publication date: 17/06/2013
Field of study

National audienceCet article étudie la possibilité de créer un nouveau corpus écrit en français annoté morpho-syntaxiquement à partir d'un corpus annoté existant. Nos objectifs sont de se libérer de la licence d'exploitation contraignante du corpus d'origine et d'obtenir une modernisation perpétuelle des textes. Nous montrons qu'un corpus pré-annoté automatiquement peut permettre d'entraîner un étiqueteur produisant des performances état-de-l'art, si ce corpus est suffisamment grand

Annotation fonctionnelle de corpus arborés avec des Champs Aléatoires Conditionnels

Author: Balvet Antonio
Laurence Grégoire
Moreau Erwan
Poibeau Thierry
Rozenknop Antoine
Tellier Isabelle
Publication venue: HAL CCSD
Publication date: 24/06/2009
Field of study

National audienceL'objectif de cet article est d'évaluer dans quelle mesure les "fonctions syntaxiques" qui figurent dans une partie du corpus arboré de Paris 7 sont apprenables à partir d'exemples. La technique d'apprentissage automatique employée pour cela fait appel aux "Champs Aléatoires Conditionnels" (Conditional Random Fields ou CRF), dans une variante adaptée à l'annotation d'arbres. Les expériences menées sont décrites en détail et analysées. Moyennant un bon paramétrage, elles atteignent une F1-mesure de plus de 80%

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Paris 13

Ensuring Query Compatibility with Evolving XML Schemas

Author: Genevès Pierre
Layaïda Nabil
Quint Vincent
Publication venue
Publication date: 01/01/2008
Field of study

During the life cycle of an XML application, both schemas and queries may change from one version to another. Schema evolutions may affect query results and potentially the validity of produced data. Nowadays, a challenge is to assess and accommodate the impact of theses changes in rapidly evolving XML applications. This article proposes a logical framework and tool for verifying forward/backward compatibility issues involving schemas and queries. First, it allows analyzing relations between schemas. Second, it allows XML designers to identify queries that must be reformulated in order to produce the expected results across successive schema versions. Third, it allows examining more precisely the impact of schema changes over queries, therefore facilitating their reformulation

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1