37 research outputs found

    Establishing a New State-of-the-Art for French Named Entity Recognition

    Get PDF
    The French TreeBank developed at the University Paris 7 is the main source of morphosyntactic and syntactic annotations for French. However, it does not include explicit information related to named entities, which are among the most useful information for several natural language processing tasks and applications. Moreover, no large-scale French corpus with named entity annotations contain referential information, which complement the type and the span of each mention with an indication of the entity it refers to. We have manually annotated the French TreeBank with such information, after an automatic pre-annotation step. We sketch the underlying annotation guidelines and we provide a few figures about the resulting annotations

    Population of a Knowledge Base for News Metadata from Unstructured Text and Web Data

    Get PDF
    International audienceWe present a practical use case of knowl- edge base (KB) population at the French news agency AFP. The target KB instances are en- tities relevant for news production and con- tent enrichment. In order to acquire uniquely identified entities over news wires, i.e. tex- tual data, and integrate the resulting KB in the Linked Data framework, a series of data mod- els need to be aligned: Web data resources are harvested for creating a wide coverage entity database, which is in turn used to link entities to their mentions in French news wires. Fi- nally, the extracted entities are selected for in- stantiation in the target KB. We describe our methodology along with the resources created and used for the target KB population

    Establishing a New State-of-the-Art for French Named Entity Recognition

    Get PDF
    Due to COVID19 pandemic, the 12th edition is cancelled. The LREC 2020 Proceedings are available at http://www.lrec-conf.org/proceedings/lrec2020/index.htmlInternational audienceThe French TreeBank developed at the University Paris 7 is the main source of morphosyntactic and syntactic annotations for French. However, it does not include explicit information related to named entities, which are among the most useful information for several natural language processing tasks and applications. Moreover, no large-scale French corpus with named entity annotations contain referential information, which complement the type and the span of each mention with an indication of the entity it refers to. We have manually annotated the French TreeBank with such information, after an automatic pre-annotation step. We sketch the underlying annotation guidelines and we provide a few figures about the resulting annotations

    Annotation référentielle du Corpus Arboré de Paris 7 en entités nommées

    Get PDF
    National audienceThe French TreeBank developed at the University Paris 7 is the main source of morphosyntactic and syntactic annotations for French. However, it does not include explicit information related to named entities, which are among the most useful information for several natural language processing tasks and applications. Moreover, no large-scale French corpus with named entity annotations contain referential information, which complement the type and the span of each mention with an indication of the entity it refers to. We have manually annotated the French TreeBank with such information, after an automatic pre-annotation step. We sketch the underlying annotation guidelines and we provide a few figures about the resulting annotations.Le Corpus Arboré de Paris 7 (ou French TreeBank) est le corpus de référence pour le français aux niveaux morphosyntaxique et syntaxique. Toutefois, il ne contient pas d'annotations explicites en entités nommées. Ces dernières sont pourtant parmi les informations les plus utiles pour de nombreuses tâches en traitement automatique des langues et de nombreuses applications. De plus, aucun corpus du français annoté en entités nommées et de taille importante ne contient d'annotation référentielle, qui complète les informations de typage et d'empan sur chaque mention par l'indication de l'entité à laquelle elle réfère. Nous avons annoté manuellement avec ce type d'informations, après pré-annotation automatique, le Corpus Arboré de Paris 7. Nous décrivons les grandes lignes du guide d'annotation sous-jacent et nous donnons quelques informations quantitatives sur les annotations obtenues

    Annotation référentielle du Corpus Arboré de Paris 7 en entités nommées

    Get PDF
    National audienceThe French TreeBank developed at the University Paris 7 is the main source of morphosyntactic and syntactic annotations for French. However, it does not include explicit information related to named entities, which are among the most useful information for several natural language processing tasks and applications. Moreover, no large-scale French corpus with named entity annotations contain referential information, which complement the type and the span of each mention with an indication of the entity it refers to. We have manually annotated the French TreeBank with such information, after an automatic pre-annotation step. We sketch the underlying annotation guidelines and we provide a few figures about the resulting annotations.Le Corpus Arboré de Paris 7 (ou French TreeBank) est le corpus de référence pour le français aux niveaux morphosyntaxique et syntaxique. Toutefois, il ne contient pas d'annotations explicites en entités nommées. Ces dernières sont pourtant parmi les informations les plus utiles pour de nombreuses tâches en traitement automatique des langues et de nombreuses applications. De plus, aucun corpus du français annoté en entités nommées et de taille importante ne contient d'annotation référentielle, qui complète les informations de typage et d'empan sur chaque mention par l'indication de l'entité à laquelle elle réfère. Nous avons annoté manuellement avec ce type d'informations, après pré-annotation automatique, le Corpus Arboré de Paris 7. Nous décrivons les grandes lignes du guide d'annotation sous-jacent et nous donnons quelques informations quantitatives sur les annotations obtenues

    Extension dynamique de lexiques morphologiques pour le français à partir d'un flux textuel

    Get PDF
    International audienceLexical incompleteness is a recurring problem when dealing with natural language and its variability. It seems indeed necessary today to regularly validate and extend lexica used by tools processing large amounts of textual data. This is even more true when processing real-time text flows. In this context, our paper introduces techniques aimed at addressing words unknown to a lexicon. We first study neology (from a theoretic and corpus-based point of view) and describe the modules we have developed for detecting them and inferring information about them (lemma, category, inflectional class). We show that we are able, using various modules for analyzing derived and compound neologisms, to generate lexical entries candidates in real-time and with a good precision.L'incomplétude lexicale est un problème récurrent lorsque l'on cherche à traiter le langage naturel dans sa variabilité. Effectivement, il semble aujourd'hui nécessaire de vérifier et compléter régulièrement les lexiques utilisés par les applications qui analysent d'importants volumes de textes. Ceci est plus particulièrement vrai pour les flux textuels en temps réel. Dans ce contexte, notre article présente des solutions dédiées au traitement des mots inconnus d'un lexique. Nous faisons une étude des néologismes (linguistique et sur corpus) et détaillons la mise en œuvre de modules d'analyse dédiés à leur détection et à l'inférence d'informations (forme de citation, catégorie et classe flexionnelle) à leur sujet. Nous y montrons que nous sommes en mesure, grâce notamment à des modules d'analyse des dérivés et des composés, de proposer en temps réel des entrées pour ajout aux lexiques avec une bonne précision

    Extension dynamique de lexiques morphologiques pour le français à partir d'un flux textuel

    Get PDF
    International audienceLexical incompleteness is a recurring problem when dealing with natural language and its variability. It seems indeed necessary today to regularly validate and extend lexica used by tools processing large amounts of textual data. This is even more true when processing real-time text flows. In this context, our paper introduces techniques aimed at addressing words unknown to a lexicon. We first study neology (from a theoretic and corpus-based point of view) and describe the modules we have developed for detecting them and inferring information about them (lemma, category, inflectional class). We show that we are able, using various modules for analyzing derived and compound neologisms, to generate lexical entries candidates in real-time and with a good precision.L'incomplétude lexicale est un problème récurrent lorsque l'on cherche à traiter le langage naturel dans sa variabilité. Effectivement, il semble aujourd'hui nécessaire de vérifier et compléter régulièrement les lexiques utilisés par les applications qui analysent d'importants volumes de textes. Ceci est plus particulièrement vrai pour les flux textuels en temps réel. Dans ce contexte, notre article présente des solutions dédiées au traitement des mots inconnus d'un lexique. Nous faisons une étude des néologismes (linguistique et sur corpus) et détaillons la mise en œuvre de modules d'analyse dédiés à leur détection et à l'inférence d'informations (forme de citation, catégorie et classe flexionnelle) à leur sujet. Nous y montrons que nous sommes en mesure, grâce notamment à des modules d'analyse des dérivés et des composés, de proposer en temps réel des entrées pour ajout aux lexiques avec une bonne précision

    Essays on Attractiveness of Multinational Corporations

    Get PDF
    This dissertation analyzes selected policies designed to attract foreign direct investment (FDI) as a means of economic growth. The focus is on multinational corporations (MNCs) because most foreign direct investment is done by MNCs. The dissertation first shows the effects that the presence of MNCs has on economic growth before examining tradeoffs between direct costs (i.e., transportation and production costs) and policy factors in attracting MNC FDI. Essays 1, \u27Multinational Corporations and Their Effect on Gross Domestic Product\u27 and 2, \u27Competing for Innovation: The Economics of Knowledge Acquisition\u27 examine how FDI in combination with socioeconomic, economic, and policy factors affect the growth of gross domestic product (GDP). The collective results suggest that policies of regionalization drive GDP growth and influence FDI location. Nations that are corporate homes of the largest and most internationalized MNCs benefit from policies of regionalization as they aid the global expansion of their corporations. Importantly, these two essays provide empirical evidence of the value transfer of MNC internationalization back home and of the importance MNC concentration at the national level. The presence of MNC networks provide knowledge and aid in the innovative capacity of both developed and developing countries. Both essays find that GDP growth driven by MNC activity has been stronger in the developing world since 2000. The two essays contribute to the globalization literature by providing empirical evidence of the increasing importance of emerging markets in the new economy, the role of MNCs in that increasing importance, the political and diplomatic implication of these related developments, and the policies nations currently employ to stay competitive in a turbulent environment. Essay 3, \u27Fleeing Regulation: Pollution Havens in Textile Manufacturing\u27 provides an example of the importance of regulatory policy by examining the effect of a policy change on FDI flows in the context of the garment sector. The results indicate that the removal of the quota system in the international trade of garments increased FDI in nations with permissive environmental policies, which in turn, has contributed significantly to leading to toxins and pollutants in local ecosystems. The dissertation provides empirical evidence that under globalization nations compete for FDI through policy. The extant literature argues that globalization is a product of two sets of factors: (1) reductions in `spatial friction\u27 (i.e., decreasing transportation, information, and organization-of- production costs), and (2) reductions in trade barriers, both in terms of border restrictions and in terms of domestic policies affecting foreign and domestic direct investment. The major contribution of the dissertation is in providing empirical evidence that under globalization nations compete for FDI by creating attractive regulatory environments for MNCs. There are social costs to be born in the competition for FDI and this dissertation shows that the nations that are corporate homes to the world\u27s largest MNCs are often better positioned to absorb costs associated with knowledge sourcing as well as export pollution costs to their more lenient trading partners

    Service orientations of manufacturing companies : impact on new product success

    Get PDF
    corecore