Search CORE

246 research outputs found

Corpora for Computational Linguistics

Author: Evans Richard
Ha Le An
Hasler Laura
Mitkov Ruslan
Orăsan Constantin
Publication venue: 'Universidade Federal de Santa Catarina (UFSC)'
Publication date: 01/01/2007
Field of study

Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction. Their influence on other fields is also briefly discussed

Directory of Open Access Journals

Wolverhampton Intellectual Repository and E-theses

Combination of 3 Types of Speech Recognizers for Anaphora Resolution

Author: Endo Tsutomu
Shimada Kazutaka
Tanamachi Noriko
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus

Author: Artstein R
Bristot A
Cavicchio F
Delogu F
Poesio M
Rodriguez KJ
Uryupina O
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2020
Field of study

Queen Mary Research Online

Towards Automatic Dialogue Understanding

Author: Antonella Bristot
Rodolfo Delmonte
Vincenzo Pallotta
Publication venue: OPAR - Open Archive dell’Università degli Studi di Napoli L’Orientale
Publication date: 01/01/2010
Field of study

In this paper we will present work carried out to scale up the system for text understanding called GETARUNS, and port it to be used in dialogue understanding. The current goal is that of extracting automatically argumentative information in order to build argumentative structure. The long term goal is using argumentative structure to produce automatic summarization of spoken dialogues. Very much like other deep linguistic processing systems (see Allen et al, 2007), our system is a generic text/dialogue understanding system that can be used in connection with an ontology – WordNet – and other similar repositories of commonsense knowledge. Word sense disambiguation takes place at the level of semantic interpretation and is represented in the Discourse Model. We will present the adjustments we made in order to cope with transcribed spoken dialogues like those produced in the ICSI Berkely project. The low level component is organized according to LFG theory; at this level, the system does pronominal binding, quantifier raising and temporal interpretation. The high level component is where the Discourse Model is created from the Logical Form. For longer sentences the system switches from the top-down to the bottom-up system. In case of failure it will back off to the partial system which produces a very lean and shallow semantics with no inference rules. In a final section, we present preliminary evaluation of the system on two tasks: the task of automatic argumentative labelling and another frequently addressed task: referential vs. non-referential pronominal detection. Results obtained fair much higher than those reported in similar experiments with machine learning approaches

Archivio Ricerca Ca'Foscari

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

An Annotated Corpus of Reference Resolution for Interpreting Common Grounding

Author: Aizawa Akiko
Udagawa Takuma
Publication venue
Publication date: 18/11/2019
Field of study

Common grounding is the process of creating, repairing and updating mutual understandings, which is a fundamental aspect of natural language conversation. However, interpreting the process of common grounding is a challenging task, especially under continuous and partially-observable context where complex ambiguity, uncertainty, partial understandings and misunderstandings are introduced. Interpretation becomes even more challenging when we deal with dialogue systems which still have limited capability of natural language understanding and generation. To address this problem, we consider reference resolution as the central subtask of common grounding and propose a new resource to study its intermediate process. Based on a simple and general annotation schema, we collected a total of 40,172 referring expressions in 5,191 dialogues curated from an existing corpus, along with multiple judgements of referent interpretations. We show that our annotation is highly reliable, captures the complexity of common grounding through a natural degree of reasonable disagreements, and allows for more detailed and quantitative analyses of common grounding strategies. Finally, we demonstrate the advantages of our annotation for interpreting, analyzing and improving common grounding in baseline dialogue systems.Comment: 9 pages, 7 figures, 6 tables, Accepted by AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Deep Linguistic Processing with GETARUNS for Spoken Dialogue Understanding

Author: Antonella Bristot
Rodolfo Delmonte
Vincenzo Pallotta
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2010
Field of study

Archivio Ricerca Ca'Foscari

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Resolving pronominal anaphora using commonsense knowledge

Author: Javadpour Seyedeh Leili
Publication venue: LSU Digital Commons
Publication date: 01/01/2013
Field of study

Coreference resolution is the task of resolving all expressions in a text that refer to the same entity. Such expressions are often used in writing and speech as shortcuts to avoid repetition. The most frequent form of coreference is the anaphor. To resolve anaphora not only grammatical and syntactical strategies are required, but also semantic approaches should be taken into consideration. This dissertation presents a framework for automatically resolving pronominal anaphora by integrating recent findings from the field of linguistics with new semantic features. Commonsense knowledge is the routine knowledge people have of the everyday world. Because such knowledge is widely used it is frequently omitted from social communications such as texts. It is understandable that without this knowledge computers will have difficulty making sense of textual information. In this dissertation a new set of computational and linguistic features are used in a supervised learning approach to resolve the pronominal anaphora in document. Commonsense knowledge sources such as ConceptNet and WordNet are used and similarity measures are extracted to uncover the elaborative information embedded in the words that can help in the process of anaphora resolution. The anaphoric system is tested on 350 Wall Street Journal articles from the BBN corpus. When compared with other systems available such as BART (Versley et al. 2008) and Charniak and Elsner 2009, our system performed better and also resolved a much wider range of anaphora. We were able to achieve a 92% F-measure on the BBN corpus and an average of 85% F-measure when tested on other genres of documents such as children stories and short stories selected from the web

Louisiana State University

A Computational Model For Anaphora Resolution In Turkish Based On The Centering Theory

Author: Aykaç Ramiz Erman
Publication venue: 'Nara Institute of Science and Technology'
Publication date: 12/11/2015
Field of study

Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2005Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2005Bu çalışmada Türkçe cümlelerdeki bazı adılların işaret ettiği varlıkların tespit edilmesini sağlayan bir bilgisayısal model tasarlanmış ve sonuçların gözlemlenmesi amacıyla modeli oluşturan modullerin bir çoğu da gerçeklenmiştir. Bu modelin tasarlanmasında kullanılan temel yaklaşım Merkezleme Teorisidir. Merkezleme teorisi cümleler içerisinde varlık belirten ifadeler arasında bir üstünlük hiyerarşisine ihtiyaç duyduğundan, önceki çalışmalarda ortaya atılan hiyerarşilerin Türkçe Dili için uygunluğu çeşitli deneylerle test edilmiş ancak tam bir uygunluk sağlayan hiyerarşi bulunamamıştır. Bu nedenle çalışma içerisinde bu üstünlüğü belirleyecek bir hiyerarşi önerilmiş ve testler sonucunda önerilen hiyerarşinin tatmin edici sonuçlar ürettiği gözlemlenmiştir. Önerilen bu hiyerarşi tasarlanan modelin gerçeklenen modülleri içerisinde kullanılarak, Türkçe için adıl çözümlemesi yapan bir uygulama geliştirilmiştir.In this study, a computational model which provides defining the pronouns and their antecedents is designed and many of the modules of the model is implemented to observe the results. The basic approach in designing the model is the Centering Theory. Since the Centering Theory needs a hierarchy which defines the salience of the entities in the sentences, the hierarchies suggested in former studies are tested on some experiments but a complete suitability for Turkish could not be found. For this reason a new hierarchy defines the salience is developed and with the help of the tests, satisfactory results are observed for this hierarchy. Using the hierarchy suggested in the scope of the study in the implemented modules of the designed model, an application that completes the anaphora resolution on Turkish texts is developed.Yüksek LisansM.Sc

Ulusal Üniversitelerarası Açık Erişim Sistemi - İstanbul Teknik Üniversitesi