42 research outputs found
CLiFF Notes: Research In Natural Language Processing at the University of Pennsylvania
CLIFF is the Computational Linguists\u27 Feedback Forum. We are a group of students and faculty who gather once a week to hear a presentation and discuss work currently in progress. The \u27feedback\u27 in the group\u27s name is important: we are interested in sharing ideas, in discussing ongoing research, and in bringing together work done by the students and faculty in Computer Science and other departments.
However, there are only so many presentations which we can have in a year. We felt that it would be beneficial to have a report which would have, in one place, short descriptions of the work in Natural Language Processing at the University of Pennsylvania. This report then, is a collection of abstracts from both faculty and graduate students, in Computer Science, Psychology and Linguistics. We want to stress the close ties between these groups, as one of the things that we pride ourselves on here at Penn is the communication among different departments and the inter-departmental work.
Rather than try to summarize the varied work currently underway at Penn, we suggest reading the abstracts to see how the students and faculty themselves describe their work. The report illustrates the diversity of interests among the researchers here, as well as explaining the areas of common interest. In addition, since it was our intent to put together a document that would be useful both inside and outside of the university, we hope that this report will explain to everyone some of what we are about
Anaphoric resolution of zero pronouns in Chinese in translation and reading comprehension
The primary aim of the thesis is to investigate some of the processes of
reading Chinese text by means of comparing and analysing approximately
100 parallel translations of four texts from Chinese to English. The
translations are answers to A Level examination questions. The focus of the
investigation is interpretation of the zero pronoun, a common phenomenon in
Chinese, which often requires explicitation when translated into English. The
secondary aim is to show how translation gives evidence of comprehension,
as shown by the variation in interpretation of zero pronouns. The thesis
reviews relevant psycholinguistic research into reading, particularly reading
of Chinese text. This is followed by reviews of relevant research into
translation as a
reading activity, and a discussion of its role in language
teaching and testing.The core of the thesis is the discussion of the zero pronoun in Chinese,
including discussion of anaphoric choice - the writer's decision on when to
use zero in preference to an explicit anaphoric form - and of anaphoric
resolution - how a reader decides what a zero pronoun refers to. Anaphoric
resolution may be problematic for less experienced readers of Chinese owing
to its lack of rich morphological inflection which, in other languages, provides
the reader with information. Some of the key ideas on anaphoric choice and
resolution are then applied to the analysis of the data in the parallel
translations. It would appear that factors in Chinese texts which have an effect
on comprehending zero pronouns are antecedent distance, topic persistence,
abstraction, multiplicity of arguments and the meaning of the verb.
Characteristics of the reader which may affect comprehension of the zero
pronoun include personal schemata which may lead to elaborative inferences.
On the basis of the data I suggest that mark schemes could be devised on a
scalar system encompassing optimal solution, proximal solution and nonsolution, which might help to solve the problem of variability in marking
translation.A by-product of the thesis, and an avenue for further research, is the apparent
close relationship between idea units, clause length, punctuation breaks and
antecedent distance in Chinese texts and saccade length and working memory
capacity in the reader of Chinese
CLiFF Notes: Research In Natural Language Processing at the University of Pennsylvania
The Computational Linguistics Feedback Forum (CLIFF) is a group of students and faculty who gather once a week to discuss the members\u27 current research. As the word feedback suggests, the group\u27s purpose is the sharing of ideas. The group also promotes interdisciplinary contacts between researchers who share an interest in Cognitive Science.
There is no single theme describing the research in Natural Language Processing at Penn. There is work done in CCG, Tree adjoining grammars, intonation, statistical methods, plan inference, instruction understanding, incremental interpretation, language acquisition, syntactic parsing, causal reasoning, free word order languages, ... and many other areas. With this in mind, rather than trying to summarize the varied work currently underway here at Penn, we suggest reading the following abstracts to see how the students and faculty themselves describe their work. Their abstracts illustrate the diversity of interests among the researchers, explain the areas of common interest, and describe some very interesting work in Cognitive Science.
This report is a collection of abstracts from both faculty and graduate students in Computer Science, Psychology and Linguistics. We pride ourselves on the close working relations between these groups, as we believe that the communication among the different departments and the ongoing inter-departmental research not only improves the quality of our work, but makes much of that work possible
Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan languages
Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan Languages publishes 22 papers that were presented at the conference organised in Dubrovnik, Croatia, 25-28 Septembre 2008
A constraint-based hypergraph partitioning approach to coreference resolution
The objectives of this thesis are focused on research in machine learning for
coreference resolution. Coreference resolution is a natural language processing
task that consists of determining the expressions in a discourse that mention or
refer to the same entity.
The main contributions of this thesis are (i) a new approach to coreference
resolution based on constraint satisfaction, using a hypergraph to represent
the problem and solving it by relaxation labeling; and (ii) research towards
improving coreference resolution performance using world knowledge extracted
from Wikipedia.
The developed approach is able to use entity-mention classi cation model
with more expressiveness than the pair-based ones, and overcome the weaknesses
of previous approaches in the state of the art such as linking contradictions,
classi cations without context and lack of information evaluating pairs. Furthermore,
the approach allows the incorporation of new information by adding
constraints, and a research has been done in order to use world knowledge to
improve performances.
RelaxCor, the implementation of the approach, achieved results in the
state of the art, and participated in international competitions: SemEval-2010
and CoNLL-2011. RelaxCor achieved second position in CoNLL-2011.La resolució de correferències és una tasca de processament del llenguatge natural que consisteix en determinar les expressions
d'un discurs que es refereixen a la mateixa entitat del mon real. La tasca té un efecte directe en la minería de textos així com en
moltes tasques de llenguatge natural que requereixin interpretació del discurs com resumidors, responedors de preguntes o
traducció automàtica. Resoldre les correferències és essencial si es vol poder “entendre” un text o un discurs.
Els objectius d'aquesta tesi es centren en la recerca en resolució de correferències amb aprenentatge automàtic. Concretament,
els objectius de la recerca es centren en els següents camps:
+ Models de classificació: Els models de classificació més comuns a l'estat de l'art estan basats en la classificació independent de
parelles de mencions. Més recentment han aparegut models que classifiquen grups de mencions. Un dels objectius de la tesi és
incorporar el model entity-mention a l'aproximació desenvolupada.
+ Representació del problema: Encara no hi ha una representació definitiva del problema. En aquesta tesi es presenta una
representació en hypergraf.
+ Algorismes de resolució. Depenent de la representació del problema i del model de classificació, els algorismes de ressolució
poden ser molt diversos. Un dels objectius d'aquesta tesi és trobar un algorisme de resolució capaç d'utilitzar els models de
classificació en la representació d'hypergraf.
+ Representació del coneixement: Per poder administrar coneixement de diverses fonts, cal una representació simbòlica i
expressiva d'aquest coneixement. En aquesta tesi es proposa l'ús de restriccions.
+ Incorporació de coneixement del mon: Algunes correferències no es poden resoldre només amb informació lingüística. Sovint
cal sentit comú i coneixement del mon per poder resoldre coreferències. En aquesta tesi es proposa un mètode per extreure
coneixement del mon de Wikipedia i incorporar-lo al sistem de resolució.
Les contribucions principals d'aquesta tesi son (i) una nova aproximació al problema de resolució de correferències basada en
satisfacció de restriccions, fent servir un hypergraf per representar el problema, i resolent-ho amb l'algorisme relaxation labeling; i
(ii) una recerca per millorar els resultats afegint informació del mon extreta de la Wikipedia.
L'aproximació presentada pot fer servir els models mention-pair i entity-mention de forma combinada evitant així els problemes
que es troben moltes altres aproximacions de l'estat de l'art com per exemple: contradiccions de classificacions independents,
falta de context i falta d'informació. A més a més, l'aproximació presentada permet incorporar informació afegint restriccions i s'ha
fet recerca per aconseguir afegir informació del mon que millori els resultats.
RelaxCor, el sistema que ha estat implementat durant la tesi per experimentar amb l'aproximació proposada, ha aconseguit uns
resultats comparables als millors que hi ha a l'estat de l'art. S'ha participat a les competicions internacionals SemEval-2010 i
CoNLL-2011. RelaxCor va obtenir la segona posició al CoNLL-2010
A neurolinguistic approach to pronominal resumption in Akan focus constructions
The current project explored the phonological and syntactic aspects of Akan pronominal resumption. The grammatical tone features of the resumptive pronoun and the clause determiner were assessed in Akan speakers with agrammatism. We found that the resumptive pronoun worsens wh-question comprehension in agrammatic speakers. However, the production of pronominal resumption was relatively spared. The ERP study investigated Akan native speakers’ sensitivity to the distribution of the resumptive pronoun by creating word-order and animacy violations. Our study represents a novel addition to the sentence processing field, as it looks into the interface between syntax, semantics, and phonology in Akan pronominal resumption
Proceedings of the Conference on Natural Language Processing 2010
This book contains state-of-the-art contributions to the 10th
conference on Natural Language Processing, KONVENS 2010
(Konferenz zur Verarbeitung natürlicher Sprache), with a focus
on semantic processing.
The KONVENS in general aims at offering a broad perspective
on current research and developments within the interdisciplinary
field of natural language processing. The central theme
draws specific attention towards addressing linguistic aspects
ofmeaning, covering deep as well as shallow approaches to semantic
processing. The contributions address both knowledgebased
and data-driven methods for modelling and acquiring
semantic information, and discuss the role of semantic information
in applications of language technology.
The articles demonstrate the importance of semantic processing,
and present novel and creative approaches to natural
language processing in general. Some contributions put their
focus on developing and improving NLP systems for tasks like
Named Entity Recognition or Word Sense Disambiguation, or
focus on semantic knowledge acquisition and exploitation with
respect to collaboratively built ressources, or harvesting semantic
information in virtual games. Others are set within the
context of real-world applications, such as Authoring Aids, Text
Summarisation and Information Retrieval. The collection highlights
the importance of semantic processing for different areas
and applications in Natural Language Processing, and provides
the reader with an overview of current research in this field