6 research outputs found

    Grounding event references in news

    Get PDF
    Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation

    Semantic approaches to domain template construction and opinion mining from natural language

    Get PDF
    Most of the text mining algorithms in use today are based on lexical representation of input texts, for example bag of words. A possible alternative is to first convert text into a semantic representation, one that captures the text content in a structured way and using only a set of pre-agreed labels. This thesis explores the feasibility of such an approach to two tasks on collections of documents: identifying common structure in input documents (»domain template construction«), and helping users find differing opinions in input documents (»opinion mining«). We first discuss ways of converting natural text to a semantic representation. We propose and compare two new methods with varying degrees of target representation complexity. The first method, showing more promise, is based on dependency parser output which it converts to lightweight semantic frames, with role fillers aligned to WordNet. The second method structures text using Semantic Role Labeling techniques and aligns the output to the Cyc ontology.\ud Based on the first of the above representations, we next propose and evaluate two methods for constructing frame-based templates for documents from a given domain (e.g. bombing attack news reports). A template is the set of all salient attributes (e.g. attacker, number of casualties, \ldots). The idea of both methods is to construct abstract frames for which more specific instances (according to the WordNet hierarchy) can be found in the input documents. Fragments of these abstract frames represent the sought-for attributes. We achieve state of the art performance and additionally provide detailed type constraints for the attributes, something not possible with competing methods. Finally, we propose a software system for exposing differing opinions in the news. For any given event, we present the user with all known articles on the topic and let them navigate them by three semantic properties simultaneously: sentiment, topical focus and geography of origin. The result is a dynamically reranked set of relevant articles and a near real time focused summary of those articles. The summary, too, is computed from the semantic text representation discussed above. We conducted a user study of the whole system with very positive results

    Complex question answering : minimizing the gaps and beyond

    Get PDF
    xi, 192 leaves : ill. ; 29 cmCurrent Question Answering (QA) systems have been significantly advanced in demonstrating finer abilities to answer simple factoid and list questions. Such questions are easier to process as they require small snippets of texts as the answers. However, there is a category of questions that represents a more complex information need, which cannot be satisfied easily by simply extracting a single entity or a single sentence. For example, the question: “How was Japan affected by the earthquake?” suggests that the inquirer is looking for information in the context of a wider perspective. We call these “complex questions” and focus on the task of answering them with the intention to minimize the existing gaps in the literature. The major limitation of the available search and QA systems is that they lack a way of measuring whether a user is satisfied with the information provided. This was our motivation to propose a reinforcement learning formulation to the complex question answering problem. Next, we presented an integer linear programming formulation where sentence compression models were applied for the query-focused multi-document summarization task in order to investigate if sentence compression improves the overall performance. Both compression and summarization were considered as global optimization problems. We also investigated the impact of syntactic and semantic information in a graph-based random walk method for answering complex questions. Decomposing a complex question into a series of simple questions and then reusing the techniques developed for answering simple questions is an effective means of answering complex questions. We proposed a supervised approach for automatically learning good decompositions of complex questions in this work. A complex question often asks about a topic of user’s interest. Therefore, the problem of complex question decomposition closely relates to the problem of topic to question generation. We addressed this challenge and proposed a topic to question generation approach to enhance the scope of our problem domain

    Automatic creation of domain templates

    Get PDF
    Recently, many Natural Language Processing (NLP) applications have improved the quality of their output by using various machine learning techniques to mine Information Extraction (IE) patterns for capturing information from the input text. Currently, to mine IE patterns one should know in advance the type of the information that should be captured by these patterns. In this work we propose a novel methodology for corpus analysis based on cross-examination of several document collections representing different instances of the same domain. We show that this methodology can be used for automatic domain template creation. As the problem of automatic domain template creation is rather new, there is no well-defined procedure for the evaluation of the domain template quality. Thus, we propose a methodology for identifying what information should be present in the template. Using this information we evaluate the automatically created domain templates through the text snippets retrieved according to the created templates.
    corecore