12 research outputs found

    On the contribution of word embeddings to temporal relation classification

    Get PDF
    Temporal relation classification is a challenging task, especially when there are no explicit markers to characterise the relation between temporal entities. This occurs frequently in ntersentential relations, whose entities are not connected via direct syntactic relations making classification even more difficult. In these cases, resorting to features that focus on the semantic content of the event words may be very beneficial for inferring implicit relations. Specifically, while morpho-syntactic and context features are considered sufficient for classifying event-timex pairs, we believe that exploiting distributional semantic information about event words can benefit supervised classification of other types of pairs. In this work, we assess the impact of using word embeddings as features for event words in classifying temporal relations of event-event pairs and event-DCT (document creation time) pairs

    Extracting Temporal and Causal Relations between Events

    Full text link
    Structured information resulting from temporal information processing is crucial for a variety of natural language processing tasks, for instance to generate timeline summarization of events from news documents, or to answer temporal/causal-related questions about some events. In this thesis we present a framework for an integrated temporal and causal relation extraction system. We first develop a robust extraction component for each type of relations, i.e. temporal order and causality. We then combine the two extraction components into an integrated relation extraction system, CATENA---CAusal and Temporal relation Extraction from NAtural language texts---, by utilizing the presumption about event precedence in causality, that causing events must happened BEFORE resulting events. Several resources and techniques to improve our relation extraction systems are also discussed, including word embeddings and training data expansion. Finally, we report our adaptation efforts of temporal information processing for languages other than English, namely Italian and Indonesian.Comment: PhD Thesi

    Grounding event references in news

    Get PDF
    Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation

    Processing temporal information in unstructured documents

    Get PDF
    Tese de doutoramento, Informática (Ciência da Computação), Universidade de Lisboa, Faculdade de Ciências, 2013Temporal information processing has received substantial attention in the last few years, due to the appearance of evaluation challenges focused on the extraction of temporal information from texts written in natural language. This research area belongs to the broader field of information extraction, which aims to automatically find specific pieces of information in texts, producing structured representations of that information, which can then be easily used by other computer applications. It has the potential to be useful in several applications that deal with natural language, given that many languages, among which we find Portuguese, extensively refer to time. Despite that, temporal processing is still incipient for many language, Portuguese being one of them. The present dissertation has various goals. On one hand, it addresses this current gap, by developing and making available resources that support the development of tools for this task, employing this language, and also by developing precisely this kind of tools. On the other hand, its purpose is also to report on important results of the research on this area of temporal processing. This work shows how temporal processing requires and benefits from modeling different kinds of knowledge: grammatical knowledge, logical knowledge, knowledge about the world, etc. Additionally, both machine learning methods and rule-based approaches are explored and used in the development of hybrid systems that are capable of taking advantage of the strengths of each of these two types of approach.O processamento de informação temporal tem recebido bastante atenção nos últimos anos, devido ao surgimento de desafios de avaliação focados na extração de informação temporal de textos escritos em linguagem natural. Esta área de investigação enquadra-se no campo mais lato da extração de informação, que visa encontrar automaticamente informação específica presente em textos, produzindo representações estruturadas da mesma, que podem depois ser facilmente utilizadas por outras aplicações computacionais. Tem o potencial de ser útil em diversas aplicações que lidam com linguagem natural, dado o caráter quase ubíquo da referência ao tempo cronólogico em muitas línguas, entre as quais o Português. Apesar de tudo, o processamento temporal encontra-se ainda incipiente para bastantes línguas, sendo o Português uma delas. A presente dissertação tem vários objetivos. Por um lado vem colmatar esta lacuna existente, desenvolvendo e disponibilizando recursos que suportam o desenvolvimento de ferramentas para esta tarefa, utilizando esta língua, e desenvolvendo também precisamente este tipo de ferramentas. Por outro serve também para relatar resultados importantes da pesquisa nesta área do processamento temporal. Neste trabalho, mostra- -se como o processamento temporal requer e beneficia da modelação de conhecimento de diversos níveis: gramatical, lógico, acerca do mundo, etc. Adicionalmente, são explorados tanto métodos de aprendizagem automática como abordagens baseadas em regras, desenvolvendo-se sistemas híbridos capazes de tirar partido das vantagens de cada um destes dois tipos de abordagem.Fundação para a Ciência e a Tecnologia (FCT, SFRH/BD/40140/2007
    corecore