30 research outputs found

    Lexicon-Grammar and the syntactic analysis of French

    Get PDF
    International audienceA lexicon-grammar is constituted by the elementary sentences of a language. Instead of considering words as basic syntactic units to which grammatical information is attached, we use simple sentences (subject-verb-objects) as dictionary entries. Hence, a full dictionary item is a simple sentence with a description of the corresponding distributional and transformational properties.The systematic study of French has led to an organization of its lexicon-grammar based on three main components:- the lexicon-grammar of free sentences, that is, of sentences whose verb imposes selectional restrictions on its subject and complements (e.g. 'to fall', 'to eat', 'to watch'),- the lexicon-grammar of frozen or idiomatic expressions (e.g. 'N takes N into account', 'N raises a question'),- the lexicon-grammar of support verbs. These verbs do not have the common selectional restrictions, but more complex dependencies between subject and complement (e.g. 'to have', 'to make' in 'N has an impact on N', 'N makes a certain impression on N').These three components interact in specific ways. We present the structure of the lexicon-grammar built for French and we discuss its algorithmic implications for parsing

    Lexicalization and Grammar Development

    Get PDF
    In this paper we present a fully lexicalized grammar formalism as a particularly attractive framework for the specification of natural language grammars. We discuss in detail Feature-based, Lexicalized Tree Adjoining Grammars (FB-LTAGs), a representative of the class of lexicalized grammars. We illustrate the advantages of lexicalized grammars in various contexts of natural language processing, ranging from wide-coverage grammar development to parsing and machine translation. We also present a method for compact and efficient representation of lexicalized trees.Comment: ps file. English w/ German abstract. 10 page

    ON MONITORING LANGUAGE CHANGE WITH THE SUPPORT OF CORPUS PROCESSING

    Get PDF
    One of the fundamental characteristics of language is that it can change over time. One method to monitor the change is by observing its corpora: a structured language documentation. Recent development in technology, especially in the field of Natural Language Processing allows robust linguistic processing, which support the description of diverse historical changes of the corpora. The interference of human linguist is inevitable as it determines the gold standard, but computer assistance provides considerable support by incorporating computational approach in exploring the corpora, especially historical corpora. This paper proposes a model for corpus development, where corpus are annotated to support further computational operations such as lexicogrammatical pattern matching, automatic retrieval and extraction. The corpus processing operations are performed by local grammar based corpus processing software on a contemporary Indonesian corpus. This paper concludes that data collection and data processing in a corpus are equally crucial importance to monitor language change, and none can be set aside

    Compiling Linguistic Constraints into Finite State Automata

    Get PDF
    International audienceThis paper deals with linguistic constraints encoded in the form of (binary) tables, generally called lexicon-grammar tables. We describe a unified method to compile sets of tables of linguistic constraints into Finite State Automata. This method has been practically implemented in the linguistic platform Unitex

    Dictionaries for language processing. Readability and organization of information

    Get PDF
    What makes a dictionary exploitable in Natural Language Processing (NLP)? We examine two requirements: readability of information and general architecture, and we focus on the human tasks involving NLP dictionaries: construction, update, check, correction. We exemplify our points with real cases from projects of morpho-syntactic or syntactic-semantic dictionaries.Quelles caractéristiques d'un dictionnaire le rendent exploitable pour le traitement automatique des langues (TAL) ? Nous examinons deux exigences : la lisibilité des informations et l'architecture générale, et nous nous concentrons sur les tâches humaines concernées par les dictionnaires pour le TAL : construction, mise à jour, vérification, correction. Nous illustrons nos arguments par des exemples de cas réels tirés de projets de dictionnaires morpho-syntaxiques ou syntactico-sémantiques

    Syntactic variation of support verb constructions

    Get PDF
    International audienceWe report experiments about the syntactic variations of support verb constructions, a special type of multiword expressions (MWEs) containing predicative nouns. In these expressions, the noun can occur with or without the verb, with no clear-cut semantic difference. We extracted from a large French corpus a set of examples of the two situations and derived statistical results from these data. The extraction involved large-coverage language resources and finite-state techniques. The results show that, most frequently, predicative nouns occur without a support verb. This fact has consequences on methods of extracting or recognising MWEs.Nous relatons des expériences sur les variations syntaxiques de constructions à verbe support, un type spécial d'expressions multi-mots (MWE) qui comportent des noms prédicatifs. Dans ces expressions, le nom peut apparaître avec ou sans le verbe, sans différence sémantique saillante. Nous avons extrait d'un vaste corpus de textes français un ensemble d'exemples des deux situations et nous avons tiré de ces données des résultats statistiques. L'extraction a mis en jeu des ressources linguistiques d'une couverture étendue et des techniques issues de la théorie des automates. Les résultats montrent que, la plupart du temps, les noms prédicatifs apparaissent sans verbe support. Ce fait a des conséquences sur les méthodes d'extraction et de reconnaissance de MWE

    Multilingual collocation extraction with a syntactic parser

    Get PDF
    An impressive amount of work was devoted over the past few decades to collocation extraction. The state of the art shows that there is a sustained interest in the morphosyntactic preprocessing of texts in order to better identify candidate expressions; however, the treatment performed is, in most cases, limited (lemmatization, POS-tagging, or shallow parsing). This article presents a collocation extraction system based on the full parsing of source corpora, which supports four languages: English, French, Spanish, and Italian. The performance of the system is compared against that of the standard mobile-window method. The evaluation experiment investigates several levels of the significance lists, uses a fine-grained annotation schema, and covers all the languages supported. Consistent results were obtained for these languages: parsing, even if imperfect, leads to a significant improvement in the quality of results, in terms of collocational precision (between 16.4 and 29.7%, depending on the language; 20.1% overall), MWE precision (between 19.9 and 35.8%; 26.1% overall), and grammatical precision (between 47.3 and 67.4%; 55.6% overall). This positive result bears a high importance, especially in the perspective of the subsequent integration of extraction results in other NLP application

    Parsing With Lexicalized Tree Adjoining Grammar

    Get PDF
    Most current linguistic theories give lexical accounts of several phenomena that used to be considered purely syntactic. The information put in the lexicon is thereby increased in both amount and complexity: see, for example, lexical rules in LFG (Kaplan and Bresnan, 1983), GPSG (Gazdar, Klein, Pullum and Sag, 1985), HPSG (Pollard and Sag, 1987), Combinatory Categorial Grammars (Steedman, 1987), Karttunen\u27s version of Categorial Grammar (Karttunen 1986, 1988), some versions of GB theory (Chomsky 1981), and Lexicon-Grammars (Gross 1984). We would like to take into account this fact while defining a formalism. We therefore explore the view that syntactical rules are not separated from lexical items. We say that a grammar is lexicalized (Schabes, AbeilK and Joshi, 1988) if it consists of: (1) a finite set of structures each associated with lexical items; each lexical item will be called the anchor of the corresponding structure; the structures define the domain of locality over which constraints are specified; (2) an operation or operations for composing the structures. The notion of anchor is closely related to the word associated with a functor-argument category in Categorial Grammars. Categorial Grammar (as used for example by Steedman, 1987) are \u27lexicalized\u27 according to our definition since each basic category has a lexical item associated with it

    Parsing Strategies With \u27Lexicalized\u27 Grammars: Application to Tree Adjoining Grammars

    Get PDF
    In this paper, we present a parsing strategy that arose from the development of an Earley-type parsing algorithm for TAGs (Schabes and Joshi 1988) and from some recent linguistic work in TAGs (Abeillé: 1988a). In our approach, each elementary structure is systematically associated with a lexical head. These structures specify extended domains of locality (as compared to a context-free grammar) over which constraints can be stated. These constraints either hold within the elementary structure itself or specify what other structures can be composed with a given elementary structure. The \u27grammar\u27 consists of a lexicon where each lexical item is associated with a finite number of structures for which that item is the head. There are no separate grammar rules. There are, of course, \u27rules\u27 which tell us how these structures are composed. A grammar of this form will be said to be \u27lexicalized\u27. We show that in general context-free grammars cannot be \u27lexicalized\u27. We then show how a \u27lexicalized\u27 grammar naturally follows from the extended domain of locality of TAGs and examine briefly some of the linguistic implications of our approach. A general parsing strategy for \u27lexicalized\u27 grammars is discussed. In the first stage, the parser selects a set of elementary structures associated with the lexical items in the input sentence, and in the second stage the sentence is parsed with respect to this set. The strategy is independent of nature of the elementary structures in the underlying grammar. However, we focus our attention on TAGs. Since the set of trees selected at the end of the first stage is not infinite, the parser can use in principle any search strategy. Thus, in particular, a top-down strategy can be used since problems due to recursive structures are eliminated. We then explain how the Earley-type parser for TAGs can be modified to take advantage of this approach
    corecore