280 research outputs found
SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization
We study unsupervised multi-document summarization evaluation metrics, which
require neither human-written reference summaries nor human annotations (e.g.
preferences, ratings, etc.). We propose SUPERT, which rates the quality of a
summary by measuring its semantic similarity with a pseudo reference summary,
i.e. selected salient sentences from the source documents, using contextualized
embeddings and soft token alignment techniques. Compared to the
state-of-the-art unsupervised evaluation metrics, SUPERT correlates better with
human ratings by 18-39%. Furthermore, we use SUPERT as rewards to guide a
neural-based reinforcement learning summarizer, yielding favorable performance
compared to the state-of-the-art unsupervised summarizers. All source code is
available at https://github.com/yg211/acl20-ref-free-eval.Comment: ACL 202
QNRs: toward language for intelligent machines
Impoverished syntax and nondifferentiable vocabularies make natural language a poor medium for neural representation learning and applications. Learned, quasilinguistic neural representations (QNRs) can upgrade words to embeddings and syntax to graphs to provide a more expressive and computationally tractable medium. Graph-structured, embedding-based quasilinguistic representations can support formal and informal reasoning, human and inter-agent communication, and the development of scalable quasilinguistic corpora with characteristics of both literatures and associative memory.
To achieve human-like intellectual competence, machines must be fully literate, able not only to read and learn, but to write things worth retaining as contributions to collective knowledge. In support of this goal, QNR-based systems could translate and process natural language corpora to support the aggregation, refinement, integration, extension, and application of knowledge at scale. Incremental development of QNRbased models can build on current methods in neural machine learning, and as systems mature, could potentially complement or replace today’s opaque, error-prone “foundation models” with systems that are more capable, interpretable, and epistemically reliable. Potential applications and implications are broad
Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings
In this paper we present a novel interactive multimodal learning system,
which facilitates search and exploration in large networks of social multimedia
users. It allows the analyst to identify and select users of interest, and to
find similar users in an interactive learning setting. Our approach is based on
novel multimodal representations of users, words and concepts, which we
simultaneously learn by deploying a general-purpose neural embedding model. We
show these representations to be useful not only for categorizing users, but
also for automatically generating user and community profiles. Inspired by
traditional summarization approaches, we create the profiles by selecting
diverse and representative content from all available modalities, i.e. the
text, image and user modality. The usefulness of the approach is evaluated
using artificial actors, which simulate user behavior in a relevance feedback
scenario. Multiple experiments were conducted in order to evaluate the quality
of our multimodal representations, to compare different embedding strategies,
and to determine the importance of different modalities. We demonstrate the
capabilities of the proposed approach on two different multimedia collections
originating from the violent online extremism forum Stormfront and the
microblogging platform Twitter, which are particularly interesting due to the
high semantic level of the discussions they feature
SemPCA-Summarizer: Exploiting Semantic Principal Component Analysis for Automatic Summary Generation
Text summarization is the task of condensing a document keeping the relevant information. This task integrated in wider information systems can help users to access key information without having to read everything, allowing for a higher efficiency. In this research work, we have developed and evaluated a single-document extractive summarization approach, named SemPCA-Summarizer, which reduces the dimension of a document using Principal Component Analysis technique enriched with semantic information. A concept-sentence matrix is built from the textual input document, and then, PCA is used to identify and rank the relevant concepts, which are used for selecting the most important sentences through different heuristics, thus leading to various types of summaries. The results obtained show that the generated summaries are very competitive, both from a quantitative and a qualitative viewpoint, thus indicating that our proposed approach is appropriate for briefly providing key information, and thus helping to cope with a huge amount of information available in a quicker and efficient manner
- …