15,423 research outputs found
A Robust Parsing Algorithm For Link Grammars
In this paper we present a robust parsing algorithm based on the link grammar
formalism for parsing natural languages. Our algorithm is a natural extension
of the original dynamic programming recognition algorithm which recursively
counts the number of linkages between two words in the input sentence. The
modified algorithm uses the notion of a null link in order to allow a
connection between any pair of adjacent words, regardless of their dictionary
definitions. The algorithm proceeds by making three dynamic programming passes.
In the first pass, the input is parsed using the original algorithm which
enforces the constraints on links to ensure grammaticality. In the second pass,
the total cost of each substring of words is computed, where cost is determined
by the number of null links necessary to parse the substring. The final pass
counts the total number of parses with minimal cost. All of the original
pruning techniques have natural counterparts in the robust algorithm. When used
together with memoization, these techniques enable the algorithm to run
efficiently with cubic worst-case complexity. We have implemented these ideas
and tested them by parsing the Switchboard corpus of conversational English.
This corpus is comprised of approximately three million words of text,
corresponding to more than 150 hours of transcribed speech collected from
telephone conversations restricted to 70 different topics. Although only a
small fraction of the sentences in this corpus are "grammatical" by standard
criteria, the robust link grammar parser is able to extract relevant structure
for a large portion of the sentences. We present the results of our experiments
using this system, including the analyses of selected and random sentences from
the corpus.Comment: 17 pages, compressed postscrip
Structural parsing
Parsing is an essential part of natural language processing. In this paper, structural parsing, which is based on the theory of knowledge graphs, is introduced. Under consideration of the semantic and syntactic features of natural language, both semantic and syntactic word graphs are formed. Grammar rules are derived from the syntactic word graphs. Due to the distinctions between Chinese and English, the grammar rules are given for the Chinese version and the English version of syntactic word graphs respectively. By traditional parsing a parse tree can then be given for a sentence, that can be used to map the sentence on a sentence graph. This is called structural parsing. The relationship with utterance paths is discussed. As a result, chunk indicators are proposed to guide structural parsing
Adapting a general parser to a sublanguage
In this paper, we propose a method to adapt a general parser (Link Parser) to
sublanguages, focusing on the parsing of texts in biology. Our main proposal is
the use of terminology (identication and analysis of terms) in order to reduce
the complexity of the text to be parsed. Several other strategies are explored
and finally combined among which text normalization, lexicon and
morpho-guessing module extensions and grammar rules adaptation. We compare the
parsing results before and after these adaptations
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches
We study the adaptation of Link Grammar Parser to the biomedical sublanguage
with a focus on domain terms not found in a general parser lexicon. Using two
biomedical corpora, we implement and evaluate three approaches to addressing
unknown words: automatic lexicon expansion, the use of morphological clues, and
disambiguation using a part-of-speech tagger. We evaluate each approach
separately for its effect on parsing performance and consider combinations of
these approaches. In addition to a 45% increase in parsing efficiency, we find
that the best approach, incorporating information from a domain part-of-speech
tagger, offers a statistically signicant 10% relative decrease in error. The
adapted parser is available under an open-source license at
http://www.it.utu.fi/biolg
On Parsing CHILDES
Research on child language acquisition would benefit from the availability of a large body of syntactically parsed utterances between parents and children. We consider the problem of generating such a ``treebank'' from the CHILDES corpus, which currently contains primarily orthographically transcribed speech tagged for lexical category
- …