11 research outputs found
Coreference Graph Guidance for Mind-Map Generation
Mind-map generation aims to process a document into a hierarchical structure
to show its central idea and branches. Such a manner is more conducive to
understanding the logic and semantics of the document than plain text.
Recently, a state-of-the-art method encodes the sentences of a document
sequentially and converts them to a relation graph via sequence-to-graph.
Though this method is efficient to generate mind-maps in parallel, its
mechanism focuses more on sequential features while hardly capturing structural
information. Moreover, it's difficult to model long-range semantic relations.
In this work, we propose a coreference-guided mind-map generation network
(CMGN) to incorporate external structure knowledge. Specifically, we construct
a coreference graph based on the coreference semantic relationship to introduce
the graph structure information. Then we employ a coreference graph encoder to
mine the potential governing relations between sentences. In order to exclude
noise and better utilize the information of the coreference graph, we adopt a
graph enhancement module in a contrastive learning manner. Experimental results
demonstrate that our model outperforms all the existing methods. The case study
further proves that our model can more accurately and concisely reveal the
structure and semantics of a document. Code and data are available at
https://github.com/Cyno2232/CMGN.Comment: 9 pages, 6 figures. Accepted by AAAI 202
A Survey of Deep Learning Approaches for Natural Language Processing Tasks
In recent years, deep learning has been a go-to method for solving difficult NLP problems. Deep learning models have attained state-of-the-art performance across a wide range of natural language processing applications, including text summarization, sentiment analysis, named entity identification, and language translation, by utilizing enormous neural network designs and massive volumes of training data. In this paper, we take a look at the most important deep learning methods and how they've been used for different natural language processing jobs. We go over the basics of neural network designs including CNNs, RNNs, and transformers, and we also go over some of the more recent developments, such as BERT and GPT-3. Our discussion of each method centers on its guiding principles, benefits, drawbacks, and significant NLP applications. To further illustrate the relative merits of various models, we also provide their comparative performance findings on industry-standard benchmark datasets. We also highlight some of the present difficulties and potential future avenues of study in deep learning applied to natural language processing. The purpose of this survey is to offer academics and practitioners in natural language processing a high-level perspective on how to make good use of deep learning in their respective fields
Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning
Coherent entity-aware multi-image captioning aims to generate coherent
captions for neighboring images in a news document. There are coherence
relationships among neighboring images because they often describe same
entities or events. These relationships are important for entity-aware
multi-image captioning, but are neglected in entity-aware single-image
captioning. Most existing work focuses on single-image captioning, while
multi-image captioning has not been explored before. Hence, this paper proposes
a coherent entity-aware multi-image captioning model by making use of coherence
relationships. The model consists of a Transformer-based caption generation
model and two types of contrastive learning-based coherence mechanisms. The
generation model generates the caption by paying attention to the image and the
accompanying text. The caption-caption coherence mechanism aims to render
entities in the caption of the image be also in captions of neighboring images.
The caption-image-text coherence mechanism aims to render entities in the
caption of the image be also in the accompanying text. To evaluate coherence
between captions, two coherence evaluation metrics are proposed. The new
dataset DM800K is constructed that has more images per document than two
existing datasets GoodNews and NYT800K, and is more suitable for multi-image
captioning. Experiments on three datasets show the proposed captioning model
outperforms 7 baselines according to BLUE, Rouge, METEOR, and entity precision
and recall scores. Experiments also show that the generated captions are more
coherent than that of baselines according to caption entity scores, caption
Rouge scores, the two proposed coherence evaluation metrics, and human
evaluations.Comment: 32 pages, 11 tables, 3 figure
Foucault, digital
Mitte der 1960er Jahre hat Michel Foucault die Methode der "Diskursanalyse" in die Geistes- und Sozialwissenschaften eingeführt. Besonders in der Archäologie des Wissens hat er dafür plädiert, die Geschichte des Wissens und der Wissenschaften zum Gegenstand diskursanalytischer Untersuchungen zu machen. Über ein halbes Jahrhundert später ist im Bereich der Informatik ein zunehmendes Interesse an der Diskursanalyse zu verzeichnen. In der Regel spielt Foucault dabei aber keine Rolle. Fern von jeder Archäologie setzen auch die Digital Humanities vermehrt auf die Analyse von historischen und gegenwärtigen Diskursen. Angesichts dieser Konjunkturen ist es an der Zeit, die Archäologie des Wissens neu zu lesen. Denn schon 1968 behauptete der französische Historiker Emmanuel Le Roy Ladurie "Der zukünftige Historiker wird Programmierer sein, oder er wird nicht sein." Ein Jahr später gibt Foucault mit seinem Buch auf eben diese Herausforderung eine ebenso informierte wie nuancierte Antwort. Diese Antwort ist in ihrer Aktualität und Relevanz erst noch zu entdecken
Extractive Text Summarization on Single Documents Using Deep Learning
The task of summarization can be categorized into two methods, extractive and abstractive summarization. Extractive approach selects highly meaningful sentences to form a summary while the abstractive approach interprets the original document and generates the summary in its own words. The task of generating a summary, whether extractive or abstractive, has been studied with different approaches such as statistical-based, graph-based, and deep-learning based approaches. Deep learning has achieved promising performance in comparison with the classical approaches and with the evolution of neural networks such as the attention network or commonly known as the Transformer architecture, there are potential areas for improving the summarization approach. The introduction of transformers and its encoder model BERT, has created advancement in the performance of many downstream tasks in NLP, including the summarization task. The objective of this thesis is to study the performance of deep learning-based models on text summarization through a series of experiments, and propose “SqueezeBERTSum”, a trained summarization model fine-tuned with the SqueezeBERT encoder which achieved competitive ROUGE scores retaining original BERT model’s performance by 98% with ~49% fewer trainable parameters
Machine Translation Testing via Syntactic Tree Pruning
Machine translation systems have been widely adopted in our daily life,
making life easier and more convenient. Unfortunately, erroneous translations
may result in severe consequences, such as financial losses. This requires to
improve the accuracy and the reliability of machine translation systems.
However, it is challenging to test machine translation systems because of the
complexity and intractability of the underlying neural models. To tackle these
challenges, we propose a novel metamorphic testing approach by syntactic tree
pruning (STP) to validate machine translation systems. Our key insight is that
a pruned sentence should have similar crucial semantics compared with the
original sentence. Specifically, STP (1) proposes a core semantics-preserving
pruning strategy by basic sentence structure and dependency relations on the
level of syntactic tree representation; (2) generates source sentence pairs
based on the metamorphic relation; (3) reports suspicious issues whose
translations break the consistency property by a bag-of-words model. We further
evaluate STP on two state-of-the-art machine translation systems (i.e., Google
Translate and Bing Microsoft Translator) with 1,200 source sentences as inputs.
The results show that STP can accurately find 5,073 unique erroneous
translations in Google Translate and 5,100 unique erroneous translations in
Bing Microsoft Translator (400% more than state-of-the-art techniques), with
64.5% and 65.4% precision, respectively. The reported erroneous translations
vary in types and more than 90% of them cannot be found by state-of-the-art
techniques. There are 9,393 erroneous translations unique to STP, which is
711.9% more than state-of-the-art techniques. Moreover, STP is quite effective
to detect translation errors for the original sentences with a recall reaching
74.0%, improving state-of-the-art techniques by 55.1% on average.Comment: Accepted to ACM Transactions on Software Engineering and Methodology
2024 (TOSEM'24
Optimal Transport in Summarisation: Towards Unsupervised Multimodal Summarisation
Summarisation aims to condense a given piece of information into a short and succinct summary that best covers its semantics with the least redundancy. With the explosion of multimedia data, multimodal summarisation with multimodal output emerges and extends the inquisitiveness of the task. Summarising a video-document pair into a visual-textual summary helps users obtain a more informative and visual understanding. Although various methods have achieved promising performance, they have limitations, including expensive training, lack of interpretability, and insufficient brevity.
Therefore, this thesis addresses the gap and examines the application of optimal transport (OT) in unsupervised summarisation. The major contributions are as follows: 1) An interpretable OT-based method is proposed for text summarisation. It formulates summary sentence extraction as minimising the transportation cost of their semantic distributions; 2) An efficient and interpretable unsupervised reinforcement learning method is proposed for text summarisation. Multihead attentional pointer-based networks learn the representation and extract salient sentences and words. The learning strategy mimics human judgment by optimising summary quality regarding OT-based semantic coverage and fluency; 3) A new task, eXtreme Multimodal Summarisation with Multiple Output (XMSMO) is introduced. It summarises a video-document pair into an extremely short multimodal summary. An unsupervised Hierarchical Optimal Transport Network learns and uses OT solvers to maximise multimodal semantic coverage. A new large-scale dataset is constructed to facilitate future research; 4) A Topic-Guided Co-Attention Transformer method is proposed for XMSMO. It constructs a two-stage uni- and cross-modal modelling with topic guidance. An OT-guided unsupervised training strategy optimises the similarity between semantic distributions of topics. Comprehensive experiments demonstrate the effectiveness of the proposed methods