32 research outputs found
Research in the Language, Information and Computation Laboratory of the University of Pennsylvania
This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania.
It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition.
Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue itâs easier than ever to do so: this document is accessible on the âinformation superhighwayâ. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html
In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authorsâ abstracts in the web version of this report.
The abstracts describe the researchersâ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn
CLiFF Notes: Research In Natural Language Processing at the University of Pennsylvania
The Computational Linguistics Feedback Forum (CLIFF) is a group of students and faculty who gather once a week to discuss the members\u27 current research. As the word feedback suggests, the group\u27s purpose is the sharing of ideas. The group also promotes interdisciplinary contacts between researchers who share an interest in Cognitive Science.
There is no single theme describing the research in Natural Language Processing at Penn. There is work done in CCG, Tree adjoining grammars, intonation, statistical methods, plan inference, instruction understanding, incremental interpretation, language acquisition, syntactic parsing, causal reasoning, free word order languages, ... and many other areas. With this in mind, rather than trying to summarize the varied work currently underway here at Penn, we suggest reading the following abstracts to see how the students and faculty themselves describe their work. Their abstracts illustrate the diversity of interests among the researchers, explain the areas of common interest, and describe some very interesting work in Cognitive Science.
This report is a collection of abstracts from both faculty and graduate students in Computer Science, Psychology and Linguistics. We pride ourselves on the close working relations between these groups, as we believe that the communication among the different departments and the ongoing inter-departmental research not only improves the quality of our work, but makes much of that work possible
Wide-coverage parsing for Turkish
Wide-coverage parsing is an area that attracts much attention in natural language processing
research. This is due to the fact that it is the first step tomany other applications
in natural language understanding, such as question answering.
Supervised learning using human-labelled data is currently the best performing
method. Therefore, there is great demand for annotated data. However, human annotation
is very expensive and always, the amount of annotated data is much less than
is needed to train well-performing parsers. This is the motivation behind making the
best use of data available. Turkish presents a challenge both because syntactically
annotated Turkish data is relatively small and Turkish is highly agglutinative, hence
unusually sparse at the whole word level.
METU-Sabancı Treebank is a dependency treebank of 5620 sentences with surface
dependency relations and morphological analyses for words. We show that including
even the crudest forms of morphological information extracted from the data boosts
the performance of both generative and discriminative parsers, contrary to received
opinion concerning English.
We induce word-based and morpheme-based CCG grammars from Turkish dependency
treebank. We use these grammars to train a state-of-the-art CCG parser that
predicts long-distance dependencies in addition to the ones that other parsers are capable
of predicting. We also use the correct CCG categories as simple features in a
graph-based dependency parser and show that this improves the parsing results.
We show that a morpheme-based CCG lexicon for Turkish is able to solve many
problems such as conflicts of semantic scope, recovering long-range dependencies,
and obtaining smoother statistics from the models. CCG handles linguistic phenomena
i.e. local and long-range dependencies more naturally and effectively than other linguistic
theories while potentially supporting semantic interpretation in parallel. Using
morphological information and a morpheme-cluster based lexicon improve the performance
both quantitatively and qualitatively for Turkish.
We also provide an improved version of the treebank which will be released by
kind permission of METU and Sabancı