246 research outputs found
Corpora for Computational Linguistics
Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction.
Their influence on other fields is also briefly discussed
Towards Automatic Dialogue Understanding
In this paper we will present work carried out to scale up the system for text understanding called GETARUNS, and port it to be used in dialogue understanding. The current goal is that of extracting automatically argumentative information in order to build argumentative structure. The long term goal is using argumentative structure to produce automatic summarization of spoken dialogues.
Very much like other deep linguistic processing systems (see Allen et al, 2007), our system is a generic text/dialogue understanding system that can be used in connection with an ontology – WordNet – and other similar repositories of commonsense knowledge. Word sense disambiguation takes place at the level of semantic interpretation and is represented in the Discourse Model. We will present the adjustments we made in order to cope with transcribed spoken dialogues like those produced in the ICSI Berkely project. The low level component is organized according to LFG theory; at this level, the system does pronominal binding, quantifier raising and temporal interpretation. The high level component is where the Discourse Model is created from the Logical Form. For longer sentences the system switches from the top-down to the bottom-up system. In case of failure it will back off to the partial system which produces a very lean and shallow semantics with no inference rules.
In a final section, we present preliminary evaluation of the system on two tasks: the task of automatic argumentative labelling and another frequently addressed task: referential vs. non-referential pronominal detection. Results obtained fair much higher than those reported in similar experiments with machine learning approaches
An Annotated Corpus of Reference Resolution for Interpreting Common Grounding
Common grounding is the process of creating, repairing and updating mutual
understandings, which is a fundamental aspect of natural language conversation.
However, interpreting the process of common grounding is a challenging task,
especially under continuous and partially-observable context where complex
ambiguity, uncertainty, partial understandings and misunderstandings are
introduced. Interpretation becomes even more challenging when we deal with
dialogue systems which still have limited capability of natural language
understanding and generation. To address this problem, we consider reference
resolution as the central subtask of common grounding and propose a new
resource to study its intermediate process. Based on a simple and general
annotation schema, we collected a total of 40,172 referring expressions in
5,191 dialogues curated from an existing corpus, along with multiple judgements
of referent interpretations. We show that our annotation is highly reliable,
captures the complexity of common grounding through a natural degree of
reasonable disagreements, and allows for more detailed and quantitative
analyses of common grounding strategies. Finally, we demonstrate the advantages
of our annotation for interpreting, analyzing and improving common grounding in
baseline dialogue systems.Comment: 9 pages, 7 figures, 6 tables, Accepted by AAAI 202
Deep Linguistic Processing with GETARUNS for Spoken Dialogue Understanding
In this paper we will present work carried out to scale up the system for text understanding called GETARUNS, and port it
to be used in dialogue understanding. The current goal is that of extracting automatically argumentative information in
order to build argumentative structure. The long term goal is using argumentative structure to produce automatic
summarization of spoken dialogues. Very much like other deep linguistic processing systems, our system is a generic
text/dialogue understanding system that can be used in connection with an ontology – WordNet - and other similar
repositories of commonsense knowledge. We will present the adjustments we made in order to cope with transcribed
spoken dialogues like those produced in the ICSI Berkeley project. In a final section we present preliminary evaluation of
the system on two tasks: the task of automatic argumentative labeling and another frequently addressed task: referential vs.
non-referential pronominal detection. Results obtained fair much higher than those reported in similar experiments with
machine learning approaches
Resolving pronominal anaphora using commonsense knowledge
Coreference resolution is the task of resolving all expressions in a text that refer to the same entity. Such expressions are often used in writing and speech as shortcuts to avoid repetition. The most frequent form of coreference is the anaphor. To resolve anaphora not only grammatical and syntactical strategies are required, but also semantic approaches should be taken into consideration. This dissertation presents a framework for automatically resolving pronominal anaphora by integrating recent findings from the field of linguistics with new semantic features. Commonsense knowledge is the routine knowledge people have of the everyday world. Because such knowledge is widely used it is frequently omitted from social communications such as texts. It is understandable that without this knowledge computers will have difficulty making sense of textual information. In this dissertation a new set of computational and linguistic features are used in a supervised learning approach to resolve the pronominal anaphora in document. Commonsense knowledge sources such as ConceptNet and WordNet are used and similarity measures are extracted to uncover the elaborative information embedded in the words that can help in the process of anaphora resolution. The anaphoric system is tested on 350 Wall Street Journal articles from the BBN corpus. When compared with other systems available such as BART (Versley et al. 2008) and Charniak and Elsner 2009, our system performed better and also resolved a much wider range of anaphora. We were able to achieve a 92% F-measure on the BBN corpus and an average of 85% F-measure when tested on other genres of documents such as children stories and short stories selected from the web
A Computational Model For Anaphora Resolution In Turkish Based On The Centering Theory
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2005Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2005Bu çalışmada Türkçe cümlelerdeki bazı adılların işaret ettiği varlıkların tespit edilmesini sağlayan bir bilgisayısal model tasarlanmış ve sonuçların gözlemlenmesi amacıyla modeli oluşturan modullerin bir çoğu da gerçeklenmiştir. Bu modelin tasarlanmasında kullanılan temel yaklaşım Merkezleme Teorisidir. Merkezleme teorisi cümleler içerisinde varlık belirten ifadeler arasında bir üstünlük hiyerarşisine ihtiyaç duyduğundan, önceki çalışmalarda ortaya atılan hiyerarşilerin Türkçe Dili için uygunluğu çeşitli deneylerle test edilmiş ancak tam bir uygunluk sağlayan hiyerarşi bulunamamıştır. Bu nedenle çalışma içerisinde bu üstünlüğü belirleyecek bir hiyerarşi önerilmiş ve testler sonucunda önerilen hiyerarşinin tatmin edici sonuçlar ürettiği gözlemlenmiştir. Önerilen bu hiyerarşi tasarlanan modelin gerçeklenen modülleri içerisinde kullanılarak, Türkçe için adıl çözümlemesi yapan bir uygulama geliştirilmiştir.In this study, a computational model which provides defining the pronouns and their antecedents is designed and many of the modules of the model is implemented to observe the results. The basic approach in designing the model is the Centering Theory. Since the Centering Theory needs a hierarchy which defines the salience of the entities in the sentences, the hierarchies suggested in former studies are tested on some experiments but a complete suitability for Turkish could not be found. For this reason a new hierarchy defines the salience is developed and with the help of the tests, satisfactory results are observed for this hierarchy. Using the hierarchy suggested in the scope of the study in the implemented modules of the designed model, an application that completes the anaphora resolution on Turkish texts is developed.Yüksek LisansM.Sc
- …