3,941 research outputs found

    Abstract Meaning Representation for Multi-Document Summarization

    Full text link
    Generating an abstract from a collection of documents is a desirable capability for many real-world applications. However, abstractive approaches to multi-document summarization have not been thoroughly investigated. This paper studies the feasibility of using Abstract Meaning Representation (AMR), a semantic representation of natural language grounded in linguistic theory, as a form of content representation. Our approach condenses source documents to a set of summary graphs following the AMR formalism. The summary graphs are then transformed to a set of summary sentences in a surface realization step. The framework is fully data-driven and flexible. Each component can be optimized independently using small-scale, in-domain training data. We perform experiments on benchmark summarization datasets and report promising results. We also describe opportunities and challenges for advancing this line of research.Comment: 13 page

    Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

    Full text link
    Concept maps can be used to concisely represent important information and bring structure into large document collections. Therefore, we study a variant of multi-document summarization that produces summaries in the form of concept maps. However, suitable evaluation datasets for this task are currently missing. To close this gap, we present a newly created corpus of concept maps that summarize heterogeneous collections of web documents on educational topics. It was created using a novel crowdsourcing approach that allows us to efficiently determine important elements in large document collections. We release the corpus along with a baseline system and proposed evaluation protocol to enable further research on this variant of summarization.Comment: Published at EMNLP 201

    An approach to graph-based analysis of textual documents

    Get PDF
    In this paper a new graph-based model is proposed for the representation of textual documents. Graph-structures are obtained from textual documents by making use of the well-known Part-Of-Speech (POS) tagging technique. More specifically, a simple rule-based (re) classifier is used to map each tag onto graph vertices and edges. As a result, a decomposition of textual documents is obtained where tokens are automatically parsed and attached to either a vertex or an edge. It is shown how textual documents can be aggregated through their graph-structures and finally, it is shown how vertex-ranking methods can be used to find relevant tokens.(1)

    Intelligent multimedia indexing and retrieval through multi-source information extraction and merging

    Get PDF
    This paper reports work on automated meta-data\ud creation for multimedia content. The approach results\ud in the generation of a conceptual index of\ud the content which may then be searched via semantic\ud categories instead of keywords. The novelty\ud of the work is to exploit multiple sources of\ud information relating to video content (in this case\ud the rich range of sources covering important sports\ud events). News, commentaries and web reports covering\ud international football games in multiple languages\ud and multiple modalities is analysed and the\ud resultant data merged. This merging process leads\ud to increased accuracy relative to individual sources
    corecore