11 research outputs found

    Coreference Graph Guidance for Mind-Map Generation

    Full text link
    Mind-map generation aims to process a document into a hierarchical structure to show its central idea and branches. Such a manner is more conducive to understanding the logic and semantics of the document than plain text. Recently, a state-of-the-art method encodes the sentences of a document sequentially and converts them to a relation graph via sequence-to-graph. Though this method is efficient to generate mind-maps in parallel, its mechanism focuses more on sequential features while hardly capturing structural information. Moreover, it's difficult to model long-range semantic relations. In this work, we propose a coreference-guided mind-map generation network (CMGN) to incorporate external structure knowledge. Specifically, we construct a coreference graph based on the coreference semantic relationship to introduce the graph structure information. Then we employ a coreference graph encoder to mine the potential governing relations between sentences. In order to exclude noise and better utilize the information of the coreference graph, we adopt a graph enhancement module in a contrastive learning manner. Experimental results demonstrate that our model outperforms all the existing methods. The case study further proves that our model can more accurately and concisely reveal the structure and semantics of a document. Code and data are available at https://github.com/Cyno2232/CMGN.Comment: 9 pages, 6 figures. Accepted by AAAI 202

    A Survey of Deep Learning Approaches for Natural Language Processing Tasks

    Get PDF
    In recent years, deep learning has been a go-to method for solving difficult NLP problems. Deep learning models have attained state-of-the-art performance across a wide range of natural language processing applications, including text summarization, sentiment analysis, named entity identification, and language translation, by utilizing enormous neural network designs and massive volumes of training data. In this paper, we take a look at the most important deep learning methods and how they've been used for different natural language processing jobs. We go over the basics of neural network designs including CNNs, RNNs, and transformers, and we also go over some of the more recent developments, such as BERT and GPT-3. Our discussion of each method centers on its guiding principles, benefits, drawbacks, and significant NLP applications. To further illustrate the relative merits of various models, we also provide their comparative performance findings on industry-standard benchmark datasets. We also highlight some of the present difficulties and potential future avenues of study in deep learning applied to natural language processing. The purpose of this survey is to offer academics and practitioners in natural language processing a high-level perspective on how to make good use of deep learning in their respective fields

    Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning

    Full text link
    Coherent entity-aware multi-image captioning aims to generate coherent captions for neighboring images in a news document. There are coherence relationships among neighboring images because they often describe same entities or events. These relationships are important for entity-aware multi-image captioning, but are neglected in entity-aware single-image captioning. Most existing work focuses on single-image captioning, while multi-image captioning has not been explored before. Hence, this paper proposes a coherent entity-aware multi-image captioning model by making use of coherence relationships. The model consists of a Transformer-based caption generation model and two types of contrastive learning-based coherence mechanisms. The generation model generates the caption by paying attention to the image and the accompanying text. The caption-caption coherence mechanism aims to render entities in the caption of the image be also in captions of neighboring images. The caption-image-text coherence mechanism aims to render entities in the caption of the image be also in the accompanying text. To evaluate coherence between captions, two coherence evaluation metrics are proposed. The new dataset DM800K is constructed that has more images per document than two existing datasets GoodNews and NYT800K, and is more suitable for multi-image captioning. Experiments on three datasets show the proposed captioning model outperforms 7 baselines according to BLUE, Rouge, METEOR, and entity precision and recall scores. Experiments also show that the generated captions are more coherent than that of baselines according to caption entity scores, caption Rouge scores, the two proposed coherence evaluation metrics, and human evaluations.Comment: 32 pages, 11 tables, 3 figure

    Foucault, digital

    Get PDF
    Mitte der 1960er Jahre hat Michel Foucault die Methode der "Diskursanalyse" in die Geistes- und Sozialwissenschaften eingeführt. Besonders in der Archäologie des Wissens hat er dafür plädiert, die Geschichte des Wissens und der Wissenschaften zum Gegenstand diskursanalytischer Untersuchungen zu machen. Über ein halbes Jahrhundert später ist im Bereich der Informatik ein zunehmendes Interesse an der Diskursanalyse zu verzeichnen. In der Regel spielt Foucault dabei aber keine Rolle. Fern von jeder Archäologie setzen auch die Digital Humanities vermehrt auf die Analyse von historischen und gegenwärtigen Diskursen. Angesichts dieser Konjunkturen ist es an der Zeit, die Archäologie des Wissens neu zu lesen. Denn schon 1968 behauptete der französische Historiker Emmanuel Le Roy Ladurie "Der zukünftige Historiker wird Programmierer sein, oder er wird nicht sein." Ein Jahr später gibt Foucault mit seinem Buch auf eben diese Herausforderung eine ebenso informierte wie nuancierte Antwort. Diese Antwort ist in ihrer Aktualität und Relevanz erst noch zu entdecken

    Extractive Text Summarization on Single Documents Using Deep Learning

    Get PDF
    The task of summarization can be categorized into two methods, extractive and abstractive summarization. Extractive approach selects highly meaningful sentences to form a summary while the abstractive approach interprets the original document and generates the summary in its own words. The task of generating a summary, whether extractive or abstractive, has been studied with different approaches such as statistical-based, graph-based, and deep-learning based approaches. Deep learning has achieved promising performance in comparison with the classical approaches and with the evolution of neural networks such as the attention network or commonly known as the Transformer architecture, there are potential areas for improving the summarization approach. The introduction of transformers and its encoder model BERT, has created advancement in the performance of many downstream tasks in NLP, including the summarization task. The objective of this thesis is to study the performance of deep learning-based models on text summarization through a series of experiments, and propose “SqueezeBERTSum”, a trained summarization model fine-tuned with the SqueezeBERT encoder which achieved competitive ROUGE scores retaining original BERT model’s performance by 98% with ~49% fewer trainable parameters

    Machine Translation Testing via Syntactic Tree Pruning

    Full text link
    Machine translation systems have been widely adopted in our daily life, making life easier and more convenient. Unfortunately, erroneous translations may result in severe consequences, such as financial losses. This requires to improve the accuracy and the reliability of machine translation systems. However, it is challenging to test machine translation systems because of the complexity and intractability of the underlying neural models. To tackle these challenges, we propose a novel metamorphic testing approach by syntactic tree pruning (STP) to validate machine translation systems. Our key insight is that a pruned sentence should have similar crucial semantics compared with the original sentence. Specifically, STP (1) proposes a core semantics-preserving pruning strategy by basic sentence structure and dependency relations on the level of syntactic tree representation; (2) generates source sentence pairs based on the metamorphic relation; (3) reports suspicious issues whose translations break the consistency property by a bag-of-words model. We further evaluate STP on two state-of-the-art machine translation systems (i.e., Google Translate and Bing Microsoft Translator) with 1,200 source sentences as inputs. The results show that STP can accurately find 5,073 unique erroneous translations in Google Translate and 5,100 unique erroneous translations in Bing Microsoft Translator (400% more than state-of-the-art techniques), with 64.5% and 65.4% precision, respectively. The reported erroneous translations vary in types and more than 90% of them cannot be found by state-of-the-art techniques. There are 9,393 erroneous translations unique to STP, which is 711.9% more than state-of-the-art techniques. Moreover, STP is quite effective to detect translation errors for the original sentences with a recall reaching 74.0%, improving state-of-the-art techniques by 55.1% on average.Comment: Accepted to ACM Transactions on Software Engineering and Methodology 2024 (TOSEM'24

    Optimal Transport in Summarisation: Towards Unsupervised Multimodal Summarisation

    Get PDF
    Summarisation aims to condense a given piece of information into a short and succinct summary that best covers its semantics with the least redundancy. With the explosion of multimedia data, multimodal summarisation with multimodal output emerges and extends the inquisitiveness of the task. Summarising a video-document pair into a visual-textual summary helps users obtain a more informative and visual understanding. Although various methods have achieved promising performance, they have limitations, including expensive training, lack of interpretability, and insufficient brevity. Therefore, this thesis addresses the gap and examines the application of optimal transport (OT) in unsupervised summarisation. The major contributions are as follows: 1) An interpretable OT-based method is proposed for text summarisation. It formulates summary sentence extraction as minimising the transportation cost of their semantic distributions; 2) An efficient and interpretable unsupervised reinforcement learning method is proposed for text summarisation. Multihead attentional pointer-based networks learn the representation and extract salient sentences and words. The learning strategy mimics human judgment by optimising summary quality regarding OT-based semantic coverage and fluency; 3) A new task, eXtreme Multimodal Summarisation with Multiple Output (XMSMO) is introduced. It summarises a video-document pair into an extremely short multimodal summary. An unsupervised Hierarchical Optimal Transport Network learns and uses OT solvers to maximise multimodal semantic coverage. A new large-scale dataset is constructed to facilitate future research; 4) A Topic-Guided Co-Attention Transformer method is proposed for XMSMO. It constructs a two-stage uni- and cross-modal modelling with topic guidance. An OT-guided unsupervised training strategy optimises the similarity between semantic distributions of topics. Comprehensive experiments demonstrate the effectiveness of the proposed methods
    corecore