5 research outputs found

    Opinion Piece: Can we Fix the Scope for Coreference? Problems and Solutions for Benchmarks beyond OntoNotes

    Get PDF
    Current work on automatic coreference resolution has focused on the OntoNotes benchmark dataset, due to both its size and consistency. However many aspects of the OntoNotes annotation scheme are not well understood by NLP practitioners, including the treatment of generic NPs, noun modifiers, indefinite anaphora, predication and more. These often lead to counterintuitive claims, results and system behaviors. This opinion piece aims to highlight some of the problems with the OntoNotes rendition of coreference, and to propose a way forward relying on three principles: 1. a focus on semantics, not morphosyntax; 2. cross-linguistic generalizability; and 3. a separation of identity and scope, which can resolve old problems involving temporal and modal domain consistency

    CaTeRS: Causal and Temporal Relation Scheme for Semantic Annotation of Event Structures

    Get PDF
    Abstract Learning commonsense causal and temporal relation between events is one of the major steps towards deeper language understanding. This is even more crucial for understanding stories and script learning. A prerequisite for learning scripts is a semantic framework which enables capturing rich event structures. In this paper we introduce a novel semantic annotation framework, called Causal and Temporal Relation Scheme (CaTeRS), which is unique in simultaneously capturing a comprehensive set of temporal and causal relations between events. By annotating a total of 1,600 sentences in the context of 320 five-sentence short stories sampled from ROCStories corpus, we demonstrate that these stories are indeed full of causal and temporal relations. Furthermore, we show that the CaTeRS annotation scheme enables high inter-annotator agreement for broad-coverage event entity annotation and moderate agreement on semantic link annotation

    Processing temporal information in unstructured documents

    Get PDF
    Tese de doutoramento, Informática (Ciência da Computação), Universidade de Lisboa, Faculdade de Ciências, 2013Temporal information processing has received substantial attention in the last few years, due to the appearance of evaluation challenges focused on the extraction of temporal information from texts written in natural language. This research area belongs to the broader field of information extraction, which aims to automatically find specific pieces of information in texts, producing structured representations of that information, which can then be easily used by other computer applications. It has the potential to be useful in several applications that deal with natural language, given that many languages, among which we find Portuguese, extensively refer to time. Despite that, temporal processing is still incipient for many language, Portuguese being one of them. The present dissertation has various goals. On one hand, it addresses this current gap, by developing and making available resources that support the development of tools for this task, employing this language, and also by developing precisely this kind of tools. On the other hand, its purpose is also to report on important results of the research on this area of temporal processing. This work shows how temporal processing requires and benefits from modeling different kinds of knowledge: grammatical knowledge, logical knowledge, knowledge about the world, etc. Additionally, both machine learning methods and rule-based approaches are explored and used in the development of hybrid systems that are capable of taking advantage of the strengths of each of these two types of approach.O processamento de informação temporal tem recebido bastante atenção nos últimos anos, devido ao surgimento de desafios de avaliação focados na extração de informação temporal de textos escritos em linguagem natural. Esta área de investigação enquadra-se no campo mais lato da extração de informação, que visa encontrar automaticamente informação específica presente em textos, produzindo representações estruturadas da mesma, que podem depois ser facilmente utilizadas por outras aplicações computacionais. Tem o potencial de ser útil em diversas aplicações que lidam com linguagem natural, dado o caráter quase ubíquo da referência ao tempo cronólogico em muitas línguas, entre as quais o Português. Apesar de tudo, o processamento temporal encontra-se ainda incipiente para bastantes línguas, sendo o Português uma delas. A presente dissertação tem vários objetivos. Por um lado vem colmatar esta lacuna existente, desenvolvendo e disponibilizando recursos que suportam o desenvolvimento de ferramentas para esta tarefa, utilizando esta língua, e desenvolvendo também precisamente este tipo de ferramentas. Por outro serve também para relatar resultados importantes da pesquisa nesta área do processamento temporal. Neste trabalho, mostra- -se como o processamento temporal requer e beneficia da modelação de conhecimento de diversos níveis: gramatical, lógico, acerca do mundo, etc. Adicionalmente, são explorados tanto métodos de aprendizagem automática como abordagens baseadas em regras, desenvolvendo-se sistemas híbridos capazes de tirar partido das vantagens de cada um destes dois tipos de abordagem.Fundação para a Ciência e a Tecnologia (FCT, SFRH/BD/40140/2007

    Increasing Informativeness in Temporal Annotation

    No full text
    In this paper, we discuss some of the challenges of adequately applying a specification language to an annotation task, as embodied in a specific guideline. In particular, we discuss some issues with TimeML motivated by error analysis on annotated TLINKs in Time-Bank. We introduce a document level information structure we call a narrative container (NC), designed to increase informativeness and accuracy of temporal relation identification. The narrative container is the default interval containing the events being discussed in the text, when no explicit temporal anchor is given. By exploiting this notion in the creation of a new temporal annotation over Time-Bank, we were able to reduce inconsistencies and increase informativeness when compared to existing TLINKs in TimeBank.
    corecore