1,974 research outputs found

    TGSum: Build Tweet Guided Multi-Document Summarization Dataset

    Full text link
    The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media's reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster documents into different topic sets. Also, a tweet with a hyper-link often highlights certain key points of the corresponding document. We synthesize a linked document cluster to form a reference summary which can cover most key points. To this aim, we adopt the ROUGE metrics to measure the coverage ratio, and develop an Integer Linear Programming solution to discover the sentence set reaching the upper bound of ROUGE. Since we allow summary sentences to be selected from both documents and high-quality tweets, the generated reference summaries could be abstractive. Both informativeness and readability of the collected summaries are verified by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. We release this dataset for further research.Comment: 7 pages, 1 figure in AAAI 201

    Multimodal video abstraction into a static document using deep learning

    Get PDF
    Abstraction is a strategy that gives the essential points of a document in a short period of time. The video abstraction approach proposed in this research is based on multi-modal video data, which comprises both audio and visual data. Segmenting the input video into scenes and obtaining a textual and visual summary for each scene are the major video abstraction procedures to summarize the video events into a static document. To recognize the shot and scene boundary from a video sequence, a hybrid features method was employed, which improves detection shot performance by selecting strong and flexible features. The most informative keyframes from each scene are then incorporated into the visual summary. A hybrid deep learning model was used for abstractive text summarization. The BBC archive provided the testing videos, which comprised BBC Learning English and BBC News. In addition, a news summary dataset was used to train a deep model. The performance of the proposed approaches was assessed using metrics like Rouge for textual summary, which achieved a 40.49% accuracy rate. While precision, recall, and F-score used for visual summary have achieved (94.9%) accuracy, which performed better than the other methods, according to the findings of the experiments

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Summarization of Scientific Paper through Reinforcement Ranking on Semantic Link Network

    Get PDF
    The Semantic Link Network is a semantics modeling method for effective information services. This paper proposes a new text summarization approach that extracts Semantic Link Network from scientific paper consisting of language units of different granularities as nodes and semantic links between the nodes, and then ranks the nodes to select Top-k sentences to compose summary. A set of assumptions for reinforcing representative nodes is set to reflect the core of paper. Then, Semantic Link Networks with different types of node and links are constructed with different combinations of the assumptions. Finally, an iterative ranking algorithm is designed for calculating the weight vectors of the nodes in a converged iteration process. The iteration approximately approaches a stable weight vector of sentence nodes, which is ranked to select Top-k high-rank nodes for composing summary. We designed six types of ranking models on Semantic Link Networks for evaluation. Both objective assessment and intuitive assessment show that ranking Semantic Link Network of language units can significantly help identify the representative sentences. This work not only provides a new approach to summarizing text based on extraction of semantic links from text but also verifies the effectiveness of adopting the Semantic Link Network in rendering the core of text. The proposed approach can be applied to implementing other summarization applications such as generating an extended abstract, the mind map and the bulletin points for making the slides of a given paper. It can be easily extended by incorporating more semantic links to improve text summarization and other information services
    corecore