5,786 research outputs found
A Robust Parsing Algorithm For Link Grammars
In this paper we present a robust parsing algorithm based on the link grammar
formalism for parsing natural languages. Our algorithm is a natural extension
of the original dynamic programming recognition algorithm which recursively
counts the number of linkages between two words in the input sentence. The
modified algorithm uses the notion of a null link in order to allow a
connection between any pair of adjacent words, regardless of their dictionary
definitions. The algorithm proceeds by making three dynamic programming passes.
In the first pass, the input is parsed using the original algorithm which
enforces the constraints on links to ensure grammaticality. In the second pass,
the total cost of each substring of words is computed, where cost is determined
by the number of null links necessary to parse the substring. The final pass
counts the total number of parses with minimal cost. All of the original
pruning techniques have natural counterparts in the robust algorithm. When used
together with memoization, these techniques enable the algorithm to run
efficiently with cubic worst-case complexity. We have implemented these ideas
and tested them by parsing the Switchboard corpus of conversational English.
This corpus is comprised of approximately three million words of text,
corresponding to more than 150 hours of transcribed speech collected from
telephone conversations restricted to 70 different topics. Although only a
small fraction of the sentences in this corpus are "grammatical" by standard
criteria, the robust link grammar parser is able to extract relevant structure
for a large portion of the sentences. We present the results of our experiments
using this system, including the analyses of selected and random sentences from
the corpus.Comment: 17 pages, compressed postscrip
Robust Grammatical Analysis for Spoken Dialogue Systems
We argue that grammatical analysis is a viable alternative to concept
spotting for processing spoken input in a practical spoken dialogue system. We
discuss the structure of the grammar, and a model for robust parsing which
combines linguistic sources of information and statistical sources of
information. We discuss test results suggesting that grammatical processing
allows fast and accurate processing of spoken input.Comment: Accepted for JNL
音声翻訳における文解析技法について
本文データは平成22年度国立国会図書館の学位論文(博士)のデジタル化実施により作成された画像ファイルを基にpdf変換したものである京都大学0048新制・論文博士博士(工学)乙第8652号論工博第2893号新制||工||968(附属図書館)UT51-94-R411(主査)教授 長尾 真, 教授 堂下 修司, 教授 池田 克夫学位規則第4条第2項該当Doctor of EngineeringKyoto UniversityDFA
PARSEC: A Constraint-Based Parser for Spoken Language Processing
PARSEC (1), a text-based and spoken language processing framework based on the Constraint Dependency Grammar (CDG) developed by Maruyama [26,27], is discussed. The scope of CDG is expanded to allow for the analysis of sentences containing lexically ambiguous words, to allow feature analysis in constraints, and to efficiently process multiple sentence candidates that are likely to arise in spoken language processing. The benefits of the CDG parsing approach are summarized. Additionally, the development CDG grammars using PARSEC grammar writing tools and the implementation of the PARSEC parser for word graphs is discussed. (1) Parallel ARchitecture Sentence Constraine
Parsing Inside-Out
The inside-outside probabilities are typically used for reestimating
Probabilistic Context Free Grammars (PCFGs), just as the forward-backward
probabilities are typically used for reestimating HMMs. I show several novel
uses, including improving parser accuracy by matching parsing algorithms to
evaluation criteria; speeding up DOP parsing by 500 times; and 30 times faster
PCFG thresholding at a given accuracy level. I also give an elegant,
state-of-the-art grammar formalism, which can be used to compute inside-outside
probabilities; and a parser description formalism, which makes it easy to
derive inside-outside formulas and many others.Comment: Ph.D. Thesis, 257 pages, 40 postscript figure
CLiFF Notes: Research in the Language Information and Computation Laboratory of The University of Pennsylvania
This report takes its name from the Computational Linguistics Feedback Forum (CLIFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science, Psychology, and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. With 48 individual contributors and six projects represented, this is the largest LINC Lab collection to date, and the most diverse
Unsupervised Language Acquisition
This thesis presents a computational theory of unsupervised language
acquisition, precisely defining procedures for learning language from ordinary
spoken or written utterances, with no explicit help from a teacher. The theory
is based heavily on concepts borrowed from machine learning and statistical
estimation. In particular, learning takes place by fitting a stochastic,
generative model of language to the evidence. Much of the thesis is devoted to
explaining conditions that must hold for this general learning strategy to
arrive at linguistically desirable grammars. The thesis introduces a variety of
technical innovations, among them a common representation for evidence and
grammars, and a learning strategy that separates the ``content'' of linguistic
parameters from their representation. Algorithms based on it suffer from few of
the search problems that have plagued other computational approaches to
language acquisition.
The theory has been tested on problems of learning vocabularies and grammars
from unsegmented text and continuous speech, and mappings between sound and
representations of meaning. It performs extremely well on various objective
criteria, acquiring knowledge that causes it to assign almost exactly the same
structure to utterances as humans do. This work has application to data
compression, language modeling, speech recognition, machine translation,
information retrieval, and other tasks that rely on either structural or
stochastic descriptions of language.Comment: PhD thesis, 133 page
- …