Search CORE

6,289 research outputs found

Fuzzy Sets Across the Natural Language Generation Pipeline

Author: Barro S.
Bugarín A.
Ramos-Soto A.
Publication venue
Publication date: 17/05/2016
Field of study

We explore the implications of using fuzzy techniques (mainly those commonly used in the linguistic description/summarization of data discipline) from a natural language generation perspective. For this, we provide an extensive discussion of some general convergence points and an exploration of the relationship between the different tasks involved in the standard NLG system pipeline architecture and the most common fuzzy approaches used in linguistic summarization/description of data, such as fuzzy quantified statements, evaluation criteria or aggregation operators. Each individual discussion is illustrated with a related use case. Recent work made in the context of cross-fertilization of both research fields is also referenced. This paper encompasses general ideas that emerged as part of the PhD thesis "Application of fuzzy sets in data-to-text systems". It does not present a specific application or a formal approach, but rather discusses current high-level issues and potential usages of fuzzy sets (focused on linguistic summarization of data) in natural language generation.Comment: Paper features: 16 pages, 2 tables, 13 figure

arXiv.org e-Print Archive

Model interpretation through lower-dimensional posterior summarization

Author: Carvalho Carlos M.
Murray Jared S.
Woody Spencer
Publication venue: 'Informa UK Limited'
Publication date: 28/02/2020
Field of study

Nonparametric regression models have recently surged in their power and popularity, accompanying the trend of increasing dataset size and complexity. While these models have proven their predictive ability in empirical settings, they are often difficult to interpret and do not address the underlying inferential goals of the analyst or decision maker. In this paper, we propose a modular two-stage approach for creating parsimonious, interpretable summaries of complex models which allow freedom in the choice of modeling technique and the inferential target. In the first stage a flexible model is fit which is believed to be as accurate as possible. In the second stage, lower-dimensional summaries are constructed by projecting draws from the distribution onto simpler structures. These summaries naturally come with valid Bayesian uncertainty estimates. Further, since we use the data only once to move from prior to posterior, these uncertainty estimates remain valid across multiple summaries and after iteratively refining a summary. We apply our method and demonstrate its strengths across a range of simulated and real datasets. Code to reproduce the examples shown is avaiable at github.com/spencerwoody/ghostComment: 40 pages, 16 figure

arXiv.org e-Print Archive

A Unified Multi-Faceted Video Summarization System

Author: Doctor Khoshrav
Iyer Rishabh
Kaushal Vishal
Ramakrishnan Ganesh
Sahoo Anurag
Shetty Suyash
Publication venue
Publication date: 04/04/2017
Field of study

This paper addresses automatic summarization and search in visual data comprising of videos, live streams and image collections in a unified manner. In particular, we propose a framework for multi-faceted summarization which extracts key-frames (image summaries), skims (video summaries) and entity summaries (summarization at the level of entities like objects, scenes, humans and faces in the video). The user can either view these as extractive summarization, or query focused summarization. Our approach first pre-processes the video or image collection once, to extract all important visual features, following which we provide an interactive mechanism to the user to summarize the video based on their choice. We investigate several diversity, coverage and representation models for all these problems, and argue the utility of these different mod- els depending on the application. While most of the prior work on submodular summarization approaches has focused on combining several models and learning weighted mixtures, we focus on the explain-ability of different the diversity, coverage and representation models and their scalability. Most importantly, we also show that we can summarize hours of video data in a few seconds, and our system allows the user to generate summaries of various lengths and types interactively on the fly.Comment: 18 pages, 11 Figure

arXiv.org e-Print Archive

Vis-DSS: An Open-Source toolkit for Visual Data Selection and Summarization

Author: Dargan Kunal
Dubal Pratik
Iyer Rishabh
Kaushal Vishal
Kothawade Suraj
Mahadev Rohan
Publication venue
Publication date: 24/09/2018
Field of study

With increasing amounts of visual data being created in the form of videos and images, visual data selection and summarization are becoming ever increasing problems. We present Vis-DSS, an open-source toolkit for Visual Data Selection and Summarization. Vis-DSS implements a framework of models for summarization and data subset selection using submodular functions, which are becoming increasingly popular today for these problems. We present several classes of models, capturing notions of diversity, coverage, representation and importance, along with optimization/inference and learning algorithms. Vis-DSS is the first open source toolkit for several Data selection and summarization tasks including Image Collection Summarization, Video Summarization, Training Data selection for Classification and Diversified Active Learning. We demonstrate state-of-the art performance on all these tasks, and also show how we can scale to large problems. Vis-DSS allows easy integration for applications to be built on it, also can serve as a general skeleton that can be extended to several use cases, including video and image sharing platforms for creating GIFs, image montage creation, or as a component to surveillance systems and we demonstrate this by providing a graphical user-interface (GUI) desktop app built over Qt framework. Vis-DSS is available at https://github.com/rishabhk108/vis-dssComment: Vis-DSS is available at https://github.com/rishabhk108/vis-ds

arXiv.org e-Print Archive

Extending a Single-Document Summarizer to Multi-Document: a Hierarchical Approach

Author: Carbonell Jaime
de Matos David Martins
Gershman Anatole
Marujo Luís
Neto João P.
Ribeiro Ricardo
Publication venue
Publication date: 10/07/2015
Field of study

The increasing amount of online content motivated the development of multi-document summarization methods. In this work, we explore straightforward approaches to extend single-document summarization methods to multi-document summarization. The proposed methods are based on the hierarchical combination of single-document summaries, and achieves state of the art results.Comment: 6 pages, Please cite: Proceedings of *SEM: the 4th Joint Conference on Lexical and Computational Semantics (bibtex: http://aclweb.org/anthology/S/S15/S15-1020.bib

arXiv.org e-Print Archive

Network Modeling and Pathway Inference from Incomplete Data ("PathInf")

Author: Chen Qitian
Guo Ning
Li Quanzheng
Li Xiang
Wang Xing
Wu Nan
Publication venue
Publication date: 01/10/2018
Field of study

In this work, we developed a network inference method from incomplete data ("PathInf") , as massive and non-uniformly distributed missing values is a common challenge in practical problems. PathInf is a two-stages inference model. In the first stage, it applies a data summarization model based on maximum likelihood to deal with the massive distributed missing values by transforming the observation-wise items in the data into state matrix. In the second stage, transition pattern (i.e. pathway) among variables is inferred as a graph inference problem solved by greedy algorithm with constraints. The proposed method was validated and compared with the state-of-art Bayesian network method on the simulation data, and shown consistently superior performance. By applying the PathInf on the lymph vascular metastasis data, we obtained the holistic pathways of the lymph node metastasis with novel discoveries on the jumping metastasis among nodes that are physically apart. The discovery indicates the possible presence of sentinel node groups in the lung lymph nodes which have been previously speculated yet never found. The pathway map can also improve the current dissection examination protocol for better individualized treatment planning, for higher diagnostic accuracy and reducing the patients trauma.Comment: Xiang Li, Qitian Che and Xing Wang contribute equally to this wor

arXiv.org e-Print Archive

What comes next? Extractive summarization by next-sentence prediction

Author: Cheung Jackie C. K.
Liu Jingyun
Louis Annie
Publication venue
Publication date: 12/01/2019
Field of study

Existing approaches to automatic summarization assume that a length limit for the summary is given, and view content selection as an optimization problem to maximize informativeness and minimize redundancy within this budget. This framework ignores the fact that human-written summaries have rich internal structure which can be exploited to train a summarization system. We present NEXTSUM, a novel approach to summarization based on a model that predicts the next sentence to include in the summary using not only the source article, but also the summary produced so far. We show that such a model successfully captures summary-specific discourse moves, and leads to better content selection performance, in addition to automatically predicting how long the target summary should be. We perform experiments on the New York Times Annotated Corpus of summaries, where NEXTSUM outperforms lead and content-model summarization baselines by significant margins. We also show that the lengths of summaries produced by our system correlates with the lengths of the human-written gold standards

arXiv.org e-Print Archive

Analyzing Evolving Stories in News Articles

Author: Barranco Roberto Camacho
Boedihardjo Arnold P.
Hossain M. Shahriar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/12/2017
Field of study

There is an overwhelming number of news articles published every day around the globe. Following the evolution of a news-story is a difficult task given that there is no such mechanism available to track back in time to study the diffusion of the relevant events in digital news feeds. The techniques developed so far to extract meaningful information from a massive corpus rely on similarity search, which results in a myopic loopback to the same topic without providing the needed insights to hypothesize the origin of a story that may be completely different than the news today. In this paper, we present an algorithm that mines historical data to detect the origin of an event, segments the timeline into disjoint groups of coherent news articles, and outlines the most important documents in a timeline with a soft probability to provide a better understanding of the evolution of a story. Qualitative and quantitative approaches to evaluate our framework demonstrate that our algorithm discovers statistically significant and meaningful stories in reasonable time. Additionally, a relevant case study on a set of news articles demonstrates that the generated output of the algorithm holds the promise to aid prediction of future entities in a story.Comment: This is a pre-print of an article published in the International Journal of Data Science and Analytics. The final authenticated version is available online at: https://doi.org/10.1007/s41060-017-0091-

arXiv.org e-Print Archive

Plan-Recognition-Driven Attention Modeling for Visual Recognition

Author: Kambhampati Subbarao
Li Baoxin
Li Yikang
Yu Tianshu
Zha Yantian
Publication venue
Publication date: 01/12/2018
Field of study

Human visual recognition of activities or external agents involves an interplay between high-level plan recognition and low-level perception. Given that, a natural question to ask is: can low-level perception be improved by high-level plan recognition? We formulate the problem of leveraging recognized plans to generate better top-down attention maps \cite{gazzaniga2009,baluch2011} to improve the perception performance. We call these top-down attention maps specifically as plan-recognition-driven attention maps. To address this problem, we introduce the Pixel Dynamics Network. Pixel Dynamics Network serves as an observation model, which predicts next states of object points at each pixel location given observation of pixels and pixel-level action feature. This is like internally learning a pixel-level dynamics model. Pixel Dynamics Network is a kind of Convolutional Neural Network (ConvNet), with specially-designed architecture. Therefore, Pixel Dynamics Network could take the advantage of parallel computation of ConvNets, while learning the pixel-level dynamics model. We further prove the equivalence between Pixel Dynamics Network as an observation model, and the belief update in partially observable Markov decision process (POMDP) framework. We evaluate our Pixel Dynamics Network in event recognition tasks. We build an event recognition system, ER-PRN, which takes Pixel Dynamics Network as a subroutine, to recognize events based on observations augmented by plan-recognition-driven attention

arXiv.org e-Print Archive

Conceptual Text Summarizer: A new model in continuous vector space

Author: Fakhredanesh Mohammad
Hoseini Seyed Mojtaba
Khademi Mohammad Ebrahim
Publication venue
Publication date: 01/09/2018
Field of study

Traditional methods of summarization are not cost-effective and possible today. Extractive summarization is a process that helps to extract the most important sentences from a text automatically and generates a short informative summary. In this work, we propose an unsupervised method to summarize Persian texts. This method is a novel hybrid approach that clusters the concepts of the text using deep learning and traditional statistical methods. First we produce a word embedding based on Hamshahri2 corpus and a dictionary of word frequencies. Then the proposed algorithm extracts the keywords of the document, clusters its concepts, and finally ranks the sentences to produce the summary. We evaluated the proposed method on Pasokh single-document corpus using the ROUGE evaluation measure. Without using any hand-crafted features, our proposed method achieves state-of-the-art results. We compared our unsupervised method with the best supervised Persian methods and we achieved an overall improvement of ROUGE-2 recall score of 7.5%.Comment: The experimental results complete

arXiv.org e-Print Archive