183 research outputs found

    Morphonette: a morphological network of French

    Get PDF
    This paper describes in details the first version of Morphonette, a new French morphological resource and a new radically lexeme-based method of morphological analysis. This research is grounded in a paradigmatic conception of derivational morphology where the morphological structure is a structure of the entire lexicon and not one of the individual words it contains. The discovery of this structure relies on a measure of morphological similarity between words, on formal analogy and on the properties of two morphological paradigms

    Constraint Logic Programming for Natural Language Processing

    Full text link
    This paper proposes an evaluation of the adequacy of the constraint logic programming paradigm for natural language processing. Theoretical aspects of this question have been discussed in several works. We adopt here a pragmatic point of view and our argumentation relies on concrete solutions. Using actual contraints (in the CLP sense) is neither easy nor direct. However, CLP can improve parsing techniques in several aspects such as concision, control, efficiency or direct representation of linguistic formalism. This discussion is illustrated by several examples and the presentation of an HPSG parser.Comment: 15 pages, uuencoded and compressed postscript to appear in Proceedings of the 5th Int. Workshop on Natural Language Understanding and Logic Programming. Lisbon, Portugal. 199

    Acquisition morphologique à partir d'un dictionnaire informatisé

    Get PDF
    10 pagesThe paper presents a linguistic and computational model aiming at making the morphological structure of the lexicon emerge from the formal and semantic regularities of the words it contains. The model is word-based. The proposed morphological structure consists of (1) binary relations that connect each headword with words that are morphologically related, and especially with the members of its morphological family and its derivational series, and of (2) the analogies that hold between the words. The model has been tested on the lexicon of French using the TLFi machine readable dictionary.L'article propose un modèle linguistique et informatique permettant de faire émerger la structure morphologique dérivationnelle du lexique à partir des régularités sémantiques et formelles des mots qu'il contient. Ce modèle est radicalement lexématique. La structure morphologique est constituée par les relations que chaque mot entretient avec les autres unités du lexique et notamment avec les mots de sa famille morphologique et de sa série dérivationnelle. Ces relations forment des paradigmes analogiques. La modélisation a été testée sur le lexique du français en utilisant le dictionnaire informatisé TLFi

    Acquisition of morphological families and derivational series from a machine readable dictionary

    Get PDF
    The paper presents a linguistic and computational model aiming at making the morphological structure of the lexicon emerge from the formal and semantic regularities of the words it contains. The model is word-based. The proposed morphological structure consists of (1) binary relations that connect each headword with words that are morphologically related, and especially with the members of its morphological family and its derivational series, and of (2) the analogies that hold between the words. The model has been tested on the lexicon of French using the TLFi machine readable dictionary.Comment: proceedings of the 6th D\'ecembrette

    Webaffix : un outil d'acquisition morphologique dérivationnelle à partir du Web

    Get PDF
    International audienceThis paper presents Webaffix, a tool for finding pairs of morphologically related words on the Web. The method used is inductive and languageindependent. Using the WWW as a corpus, the Webaffix tool detects the occurrences of new derived lexemes based on a given graphemic suffix, proposes a base lexeme, and then performs a compatibility test on the word pairs produced, using the Web again, but as a source of cooccurrences. The resulting pairs of words are used to enrich the Verbaction lexical database, which contains French verbs and their related nominals. The results are described and evaluated.L'article présente Webaffix, un outil d'acquisition de couples de lexèmes morphologiquement apparentés à partir du Web. La méthode utilisé est inductive et indépendante des langues particulières. Webaffix (1) utilise un moteur de recherche pour collecter des formes candidates qui contiennent un suffixe graphémique donné, (2) prédit les bases potentielles de ces candidats et (3) recherche sur le Web des cooccurrences des candidats et de leurs bases prédites. L'outil a été utilisé pour enrichir Verbaction, un lexique de liens entre verbes et noms d'action ou d'événement correspondants. L'article inclut une évaluation des liens morphologiques acquis

    Wiktionnaire's Wikicode GLAWIfied: a Workable French Machine-Readable Dictionary

    Get PDF
    International audienceGLAWI is a free, large-scale and versatile Machine-Readable Dictionary (MRD) that has been extracted from the French language edition of Wiktionary, called Wiktionnaire. In (Sajous and Hathout, 2015), we introduced GLAWI, gave the rationale behind the creation of this lexicographic resource and described the extraction process, focusing on the conversion and standardization of the heterogeneous data provided by this collaborative dictionary. In the current article, we describe the content of GLAWI and illustrate how it is structured. We also suggest various applications, ranging from linguistic studies, NLP applications to psycholinguistic experimentation. They all can take advantage of the diversity of the lexical knowledge available in GLAWI. Besides this diversity and extensive lexical coverage, GLAWI is also remarkable because it is the only free lexical resource of contemporary French that contains definitions. This unique material opens way to the renewal of MRD-based methods, notably the automated extraction and acquisition of semantic relations

    Webaffix: Discovering Morphological Links on the WWW

    Get PDF
    International audienceThis paper presents a new language-independent method for finding morphological links between newly appeared words (i.e. absent from reference word lists). Using the WWW as a corpus, the Webaffix tool detects the occurrences of new derived lexemes based on a given suffix, proposes a base lexeme following a standard scheme (such as noun-verb), and then performs a compatibility test on the word pairs produced, using the Web again, but as a source of cooccurrences. The resulting pairs of words are used to build generic morphological databases useful for a number of NLP tasks. We develop and comment an example use of Webaffix to find new noun/verb pairs in French

    GLAWI, a free XML-encoded Machine-Readable Dictionary built from the French Wiktionary

    Get PDF
    International audienceThis article introduces GLAWI, a large XML-encoded machine-readable dictionary automatically extracted from Wiktionnaire, the French edition of Wiktionary. GLAWI contains 1,341,410 articles and is released under a free license. Besides the size of its headword list, GLAWI inherits from Wiktionnaire its original macrostructure and the richness of its lexicographic descriptions: articles contain etymologies, definitions, usage examples, inflectional paradigms, lexical relations and phonemic transcriptions. The paper first gives some insights on the nature and content of Wiktionnaire, with a particular focus on its encoding format, before presenting our approach, the standardization of its microstructure and the conversion into XML. First intended to meet NLP needs, GLAWI has been used to create a number of customized lexicons dedicated to specific uses including linguistic description and psycholinguistics. The main one is GLĂ€FF, a large inflectional and phonological lexicon of French. We show that many more specific on demand lexicons can be easily derived from the large body of lexical knowledge encoded in GLAWI

    Webaffix : une boîte à outils d’acquisition lexicale à partir du Web

    Get PDF
    Nous présentons ici Webaffix, un outil qui permet de constituer et d’enrichir semi-automatiquement des données lexicales en utilisant le Web comme corpus. Il permet de détecter et d’analyser morphologiquement des unités lexicales nouvelles (c’est-à-dire absentes de listes de référence telles que les dictionnaires) construites par suffixation ou préfixation. Nous présentons les techniques utilisées par Webaffix, en déclinant les différents modes d’utilisation que nous avons envisagés et mis en pratique, ainsi que des exemples de résultats produits par diverses campagnes de collecte. Les données ainsi recueillies constituent des ressources lexicales pour différentes applications en traitement automatique des langues, mais également pour l’étude à grande échelle de la morphologie dérivationnelle.This paper deals with the design and use of Webaffix, a tool for semi-automatically detecting new word forms from the World Wide Web. We focus mainly on new derived words, i.e. coined from other lexemes through suffixation and/or prefixation processes. We develop the techniques and methods used in Webaffix, along with a sample of results obtained via several studies on French. Resources such as the ones created through the use of Webaffix are useful not only for natural language processing and information retrieval tasks, but also for the linguistic study of word creation

    WEBAFFIX : une boîte à outils d'acquisition lexicale à partir du Web

    Get PDF
    International audienceThis paper deals with the design and use of Webaffix, a tool for semi-automatically detecting new word forms from the World Wide Web. We focus mainly on new derived words, i.e. coined from other lexemes through suffixation and/or prefixation processes. We develop the techniques and methods used in Webaffix, along with a sample of results obtained via several studies on French. Resources such as the ones created through the use of Webaffix are useful not only for natural language processing and information retrieval tasks, but also for the linguistic study of word creation.Nous présentons ici Webaffix, un outil et une méthodologie qui permet d'enrichir et de constituer semi-automatiquement des données lexicales en utilisant le Web comme corpus. Notre approche concerne plus spécifiquement la détection et l'analyse d'unités lexicales construites par suffixation ou préfixation. Nous présentons les méthodes et techniques utilisées par Webaffix, en déclinant les différents modes d'utilisation que nous avons envisagés et mis en pratique, ainsi que des exemples de résultats produits par diverses campagnes d'utilisation. Les données ainsi recueillies sont utiles comme ressources pour différentes applications en traitement automatique des langues, mais permettent également d'étudier à grande échelle les phénomènes de création lexicale
    • …
    corecore