4 research outputs found

    On-Line Error Detection of Annotated Corpus Using Modular Neural Networks

    Full text link
    Abstract. This paper proposes an on-line error detecting method for a manually annotated corpus using min-max modular (M3) neural net-works. The basic idea of the method is to use guaranteed convergence of the M3 network to detect errors in learning data. To confirm the ef-fectiveness of the method, a preliminary computer experiment was per-formed on a small Japanese corpus containing 217 sentences. The results show that the method can not only detect errors within a corpus, but may also discover some kinds of knowledge or rules useful for natural language processing.

    Larger-first partial parsing

    Get PDF
    Larger-first partial parsing is a primarily top-down approach to partial parsing that is opposite to current easy-first, or primarily bottom-up, strategies. A rich partial tree structure is captured by an algorithm that assigns a hierarchy of structural tags to each of the input tokens in a sentence. Part-of-speech tags are first assigned to the words in a sentence by a part-of-speech tagger. A cascade of Deterministic Finite State Automata then uses this part-of-speech information to identify syntactic relations primarily in a descending order of their size. The cascade is divided into four specialized sections: (1) a Comma Network, which identifies syntactic relations associated with commas; (2) a Conjunction Network, which partially disambiguates phrasal conjunctions and llly disambiguates clausal conjunctions; (3) a Clause Network, which identifies non-comma-delimited clauses; and (4) a Phrase Network, which identifies the remaining base phrases in the sentence. Each automaton is capable of adding one or more levels of structural tags to the tokens in a sentence. The larger-first approach is compared against a well-known easy-first approach. The results indicate that this larger-first approach is capable of (1) producing a more detailed partial parse than an easy first approach; (2) providing better containment of attachment ambiguity; (3) handling overlapping syntactic relations; and (4) achieving a higher accuracy than the easy-first approach. The automata of each network were developed by an empirical analysis of several sources and are presented here in detail

    Hybrid Neuro and Rule-Based Part of Speech Taggers

    No full text
    A hybrid system for tagging part of speech is described that, consists of a ncuro tagger and a, rule-based corrector. The neuro tagger is an initial--st:te annotator tha.t uses difIrent lengths of contexts based on longest context l)riority. Its inputs are weighted by intbrmation gains that, are obtained by inlbrmation ma.xi- mization. The rule-based corrector is construct- ed by a. set o[' transformation rules to make for 1, he shortcomings o[' the neuro tagger. Computer experimengs show that a.lmosl, 20 of ghe orrors made by the neuro tagger a. re corrected by t, hese tra. ns>rmation rules, so tha. t the hybrid system ca.n reach a.n a.ccura.cy of counting only the aml>iguous words and 99.1% counting all words when a small Tha.i wil,h 22,311 a.mbiguous words is used t)r training. This accuracy is far higher than that using an IIMM and is also higher than thaL using a rule-based model
    corecore