625 research outputs found
Unsupervised Summarization by Jointly Extracting Sentences and Keywords
We present RepRank, an unsupervised graph-based ranking model for extractive
multi-document summarization in which the similarity between words, sentences,
and word-to-sentence can be estimated by the distances between their vector
representations in a unified vector space. In order to obtain desirable
representations, we propose a self-attention based learning method that
represent a sentence by the weighted sum of its word embeddings, and the
weights are concentrated to those words hopefully better reflecting the content
of a document. We show that salient sentences and keywords can be extracted in
a joint and mutual reinforcement process using our learned representations, and
prove that this process always converges to a unique solution leading to
improvement in performance. A variant of absorbing random walk and the
corresponding sampling-based algorithm are also described to avoid redundancy
and increase diversity in the summaries. Experiment results with multiple
benchmark datasets show that RepRank achieved the best or comparable
performance in ROUGE.Comment: 10 pages(includes 2 pages references), 1 figur
SparseGAN: Sparse Generative Adversarial Network for Text Generation
It is still a challenging task to learn a neural text generation model under
the framework of generative adversarial networks (GANs) since the entire
training process is not differentiable. The existing training strategies either
suffer from unreliable gradient estimations or imprecise sentence
representations. Inspired by the principle of sparse coding, we propose a
SparseGAN that generates semantic-interpretable, but sparse sentence
representations as inputs to the discriminator. The key idea is that we treat
an embedding matrix as an over-complete dictionary, and use a linear
combination of very few selected word embeddings to approximate the output
feature representation of the generator at each time step. With such
semantic-rich representations, we not only reduce unnecessary noises for
efficient adversarial training, but also make the entire training process fully
differentiable. Experiments on multiple text generation datasets yield
performance improvements, especially in sequence-level metrics, such as BLEU
Improving Coreference Resolution by Leveraging Entity-Centric Features with Graph Neural Networks and Second-order Inference
One of the major challenges in coreference resolution is how to make use of
entity-level features defined over clusters of mentions rather than mention
pairs. However, coreferent mentions usually spread far apart in an entire text,
which makes it extremely difficult to incorporate entity-level features. We
propose a graph neural network-based coreference resolution method that can
capture the entity-centric information by encouraging the sharing of features
across all mentions that probably refer to the same real-world entity. Mentions
are linked to each other via the edges modeling how likely two linked mentions
point to the same entity. Modeling by such graphs, the features between
mentions can be shared by message passing operations in an entity-centric
manner. A global inference algorithm up to second-order features is also
presented to optimally cluster mentions into consistent groups. Experimental
results show our graph neural network-based method combing with the
second-order decoding algorithm (named GNNCR) achieved close to
state-of-the-art performance on the English CoNLL-2012 Shared Task dataset
Self-tuning fuzzy controller for air-conditioning systems
Master'sMASTER OF ENGINEERIN
SPARQL Query Mediation for Data Integration
The Semantic Web provides a set of promising technologies to make sophisticated data integration much easier, because data on the semantic Web is allowed to be connected by links and complex queries can be executed against the dataset of those linked data. Although the Semantic Web techniques offer RDF/OWL to support schematic mappings between diverse data sources, large-scale data integration is still severely hampered by various types of data-level semantic heterogeneity among the data sources. In the paper, we show that SPARQL queries that are intended to execute over multiple heterogeneous data sources can be mediated automatically
- …