Search CORE

4 research outputs found

On-Line Error Detection of Annotated Corpus Using Modular Neural Networks

Author: Bao-liang Lu
Hitoshi Isahara
Masaki Murata
Michnori Ichikawa
Qing Ma
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2001
Field of study

Abstract. This paper proposes an on-line error detecting method for a manually annotated corpus using min-max modular (M3) neural net-works. The basic idea of the method is to use guaranteed convergence of the M3 network to detect errors in learning data. To confirm the ef-fectiveness of the method, a preliminary computer experiment was per-formed on a small Japanese corpus containing 217 sentences. The results show that the method can not only detect errors within a corpus, but may also discover some kinds of knowledge or rules useful for natural language processing.

CiteSeerX

Crossref

Larger-first partial parsing

Author: Van Delden Sebastian Alexander
Publication venue: University of Central Florida
Publication date: 01/01/2003
Field of study

Larger-first partial parsing is a primarily top-down approach to partial parsing that is opposite to current easy-first, or primarily bottom-up, strategies. A rich partial tree structure is captured by an algorithm that assigns a hierarchy of structural tags to each of the input tokens in a sentence. Part-of-speech tags are first assigned to the words in a sentence by a part-of-speech tagger. A cascade of Deterministic Finite State Automata then uses this part-of-speech information to identify syntactic relations primarily in a descending order of their size. The cascade is divided into four specialized sections: (1) a Comma Network, which identifies syntactic relations associated with commas; (2) a Conjunction Network, which partially disambiguates phrasal conjunctions and llly disambiguates clausal conjunctions; (3) a Clause Network, which identifies non-comma-delimited clauses; and (4) a Phrase Network, which identifies the remaining base phrases in the sentence. Each automaton is capable of adding one or more levels of structural tags to the tokens in a sentence. The larger-first approach is compared against a well-known easy-first approach. The results indicate that this larger-first approach is capable of (1) producing a more detailed partial parse than an easy first approach; (2) providing better containment of attachment ambiguity; (3) handling overlapping syntactic relations; and (4) achieving a higher accuracy than the easy-first approach. The automata of each network were developed by an empirical analysis of several sources and are presented here in detail

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Hybrid Neuro and Rule-Based Part of Speech Taggers

Author: Hitoshi Isahara
Kiyotaka Uchimoto
Masaki Murata
Qing Ma
Publication venue
Publication date: 01/01/2000
Field of study

A hybrid system for tagging part of speech is described that, consists of a ncuro tagger and a, rule-based corrector. The neuro tagger is an initial--st:te annotator tha.t uses difIrent lengths of contexts based on longest context l)riority. Its inputs are weighted by intbrmation gains that, are obtained by inlbrmation ma.xi- mization. The rule-based corrector is construct- ed by a. set o[' transformation rules to make for 1, he shortcomings o[' the neuro tagger. Computer experimengs show that a.lmosl, 20 of ghe orrors made by the neuro tagger a. re corrected by t, hese tra. ns>rmation rules, so tha. t the hybrid system ca.n reach a.n a.ccura.cy of counting only the aml>iguous words and 99.1% counting all words when a small Tha.i wil,h 22,311 a.mbiguous words is used t)r training. This accuracy is far higher than that using an IIMM and is also higher than thaL using a rule-based model

CiteSeerX

Crossref