9 research outputs found

    What conceptual graph workbenches need for natural language processing

    Get PDF
    An important capability of the conceptual graph knowledge engineering tools now under development will be the transformation of natural language texts into graphs (conceptual parsing) and its reverse, the production of text from graphs (conceptual generation). Are the existing basic designs adequate for these tasks? Experience developing the BEELINE system's natural language capabilities suggests that good entry/editing tools, a generous but not unlimited storage capacity and efficient, bidirectional lexical access techniques are needed to support the supply of data structures at both the linguistic and conceptual knowledge levels. An active formalism capable of supporting declarative and procedural programs containing both linguistic and knowledge level terms is also important. If these requirements are satisfied, future text-readers can be included as part of a conceptual knowledge workbench without unexpected problems

    The Syntactic Regularity Of English Noun Phrases

    Get PDF

    Lexicrunch : an expert system for word morphology

    Get PDF
    Natural language programs typically store words like pig and pigs as independent entries in their dictionaries, thus neglecting the obvious morphological relationship between them. Lexicrunch tries to induce such relationships from examples of root forms of words and the corresponding inflected forms. The program collates ,he examples into classes according to the difference between the inflected form and its root -- e.g. the classes for the plural noun inflection in English might include "root forms to which an -s is added" pig, apple, etc.) and "root forms which take -es" (fox, box, etc. . It then characterizes each class using a modified version of Quinlan's ID3 procedure. The resulting rule will be along the lines of, "If a noun ends in -x, form its plural by adding -es; otherwise, add -s." The program then needs to store only root forms in its dictionary; it can reconstruct plurals on demand by applying its rule. It thereby eliminates redundancy and compacts the lexicon. Lexicrunch's formalism for representing morphological rules wag influenced by the Two-level model of Koskenniemi. The program was tested on the past tense inflection in English, the first person singular present indicative of Finnish, and the past participle in French. It appeared to pick up most of the regularities in the data successfully. However, a meta-level extension to the program is indicated to enable it to capture regularities across its rules

    Open-source resources and standards for Arabic word structure analysis: Fine grained morphological analysis of Arabic text corpora

    Get PDF
    Morphological analyzers are preprocessors for text analysis. Many Text Analytics applications need them to perform their tasks. The aim of this thesis is to develop standards, tools and resources that widen the scope of Arabic word structure analysis - particularly morphological analysis, to process Arabic text corpora of different domains, formats and genres, of both vowelized and non-vowelized text. We want to morphologically tag our Arabic Corpus, but evaluation of existing morphological analyzers has highlighted shortcomings and shown that more research is required. Tag-assignment is significantly more complex for Arabic than for many languages. The morphological analyzer should add the appropriate linguistic information to each part or morpheme of the word (proclitic, prefix, stem, suffix and enclitic); in effect, instead of a tag for a word, we need a subtag for each part. Very fine-grained distinctions may cause problems for automatic morphosyntactic analysis – particularly probabilistic taggers which require training data, if some words can change grammatical tag depending on function and context; on the other hand, finegrained distinctions may actually help to disambiguate other words in the local context. The SALMA – Tagger is a fine grained morphological analyzer which is mainly depends on linguistic information extracted from traditional Arabic grammar books and prior knowledge broad-coverage lexical resources; the SALMA – ABCLexicon. More fine-grained tag sets may be more appropriate for some tasks. The SALMA –Tag Set is a theory standard for encoding, which captures long-established traditional fine-grained morphological features of Arabic, in a notation format intended to be compact yet transparent. The SALMA – Tagger has been used to lemmatize the 176-million words Arabic Internet Corpus. It has been proposed as a language-engineering toolkit for Arabic lexicography and for phonetically annotating the Qur’an by syllable and primary stress information, as well as, fine-grained morphological tagging

    A dictionary and morphological analyser for English

    No full text
    corecore