7,440 research outputs found
Hybrid Approach for Single Text Document Summarization using Statistical and Sentiment Features
Summarization is a way to represent same information in concise way with
equal sense. This can be categorized in two type Abstractive and Extractive
type. Our work is focused around Extractive summarization. A generic approach
to extractive summarization is to consider sentence as an entity, score each
sentence based on some indicative features to ascertain the quality of sentence
for inclusion in summary. Sort the sentences on the score and consider top n
sentences for summarization. Mostly statistical features have been used for
scoring the sentences. We are proposing a hybrid model for a single text
document summarization. This hybrid model is an extraction based approach,
which is combination of Statistical and semantic technique. The hybrid model
depends on the linear combination of statistical measures : sentence position,
TF-IDF, Aggregate similarity, centroid, and semantic measure. Our idea to
include sentiment analysis for salient sentence extraction is derived from the
concept that emotion plays an important role in communication to effectively
convey any message hence, it can play a vital role in text document
summarization. For comparison we have generated five system summaries Proposed
Work, MEAD system, Microsoft system, OPINOSIS system, and Human generated
summary, and evaluation is done using ROUGE score.Comment: 20 pag
PerSum: Novel Systems for Document Summarization in Persian
In this paper we explore the problem of document summarization in Persian
language from two distinct angles. In our first approach, we modify a popular
and widely cited Persian document summarization framework to see how it works
on a realistic corpus of news articles. Human evaluation on generated summaries
shows that graph-based methods perform better than the modified systems. We
carry this intuition forward in our second approach, and probe deeper into the
nature of graph-based systems by designing several summarizers based on
centrality measures. Ad hoc evaluation using ROUGE score on these summarizers
suggests that there is a small class of centrality measures that perform better
than three strong unsupervised baselines.Comment: 42 pages, 9 figure
Multi-Topic Multi-Document Summarizer
Current multi-document summarization systems can successfully extract summary
sentences, however with many limitations including: low coverage, inaccurate
extraction to important sentences, redundancy and poor coherence among the
selected sentences. The present study introduces a new concept of centroid
approach and reports new techniques for extracting summary sentences for
multi-document. In both techniques keyphrases are used to weigh sentences and
documents. The first summarization technique (Sen-Rich) prefers maximum
richness sentences. While the second (Doc-Rich), prefers sentences from
centroid document. To demonstrate the new summarization system application to
extract summaries of Arabic documents we performed two experiments. First, we
applied Rouge measure to compare the new techniques among systems presented at
TAC2011. The results show that Sen-Rich outperformed all systems in ROUGE-S.
Second, the system was applied to summarize multi-topic documents. Using human
evaluators, the results show that Doc-Rich is the superior, where summary
sentences characterized by extra coverage and more cohesion
Text Summarization Techniques: A Brief Survey
In recent years, there has been a explosion in the amount of text data from a
variety of sources. This volume of text is an invaluable source of information
and knowledge which needs to be effectively summarized to be useful. In this
review, the main approaches to automatic text summarization are described. We
review the different processes for summarization and describe the effectiveness
and shortcomings of the different methods.Comment: Some of references format have update
Extract with Order for Coherent Multi-Document Summarization
In this work, we aim at developing an extractive summarizer in the
multi-document setting. We implement a rank based sentence selection using
continuous vector representations along with key-phrases. Furthermore, we
propose a model to tackle summary coherence for increasing readability. We
conduct experiments on the Document Understanding Conference (DUC) 2004
datasets using ROUGE toolkit. Our experiments demonstrate that the methods
bring significant improvements over the state of the art methods in terms of
informativity and coherence.Comment: TextGraphs-11 at ACL 201
Extractive Summarization using Deep Learning
This paper proposes a text summarization approach for factual reports using a
deep learning model. This approach consists of three phases: feature
extraction, feature enhancement, and summary generation, which work together to
assimilate core information and generate a coherent, understandable summary. We
are exploring various features to improve the set of sentences selected for the
summary, and are using a Restricted Boltzmann Machine to enhance and abstract
those features to improve resultant accuracy without losing any important
information. The sentences are scored based on those enhanced features and an
extractive summary is constructed. Experimentation carried out on several
articles demonstrates the effectiveness of the proposed approach. Source code
available at: https://github.com/vagisha-nidhi/TextSummarizerComment: Accepted to 18th International Conference on Computational
Linguistics and Intelligent Text Processin
Toward Selectivity Based Keyword Extraction for Croatian News
Preliminary report on network based keyword extraction for Croatian is an
unsupervised method for keyword extraction from the complex network. We build
our approach with a new network measure the node selectivity, motivated by the
research of the graph based centrality approaches. The node selectivity is
defined as the average weight distribution on the links of the single node. We
extract nodes (keyword candidates) based on the selectivity value. Furthermore,
we expand extracted nodes to word-tuples ranked with the highest in/out
selectivity values. Selectivity based extraction does not require linguistic
knowledge while it is purely derived from statistical and structural
information en-compassed in the source text which is reflected into the
structure of the network. Obtained sets are evaluated on a manually annotated
keywords: for the set of extracted keyword candidates average F1 score is
24,63%, and average F2 score is 21,19%; for the exacted words-tuples candidates
average F1 score is 25,9% and average F2 score is 24,47%
Modeling Acoustic-Prosodic Cues for Word Importance Prediction in Spoken Dialogues
Prosodic cues in conversational speech aid listeners in discerning a message.
We investigate whether acoustic cues in spoken dialogue can be used to identify
the importance of individual words to the meaning of a conversation turn.
Individuals who are Deaf and Hard of Hearing often rely on real-time captions
in live meetings. Word error rate, a traditional metric for evaluating
automatic speech recognition, fails to capture that some words are more
important for a system to transcribe correctly than others. We present and
evaluate neural architectures that use acoustic features for 3-class word
importance prediction. Our model performs competitively against
state-of-the-art text-based word-importance prediction models, and it
demonstrates particular benefits when operating on imperfect ASR output.Comment: 8 pages, 2 figure
Keyphrase Based Arabic Summarizer (KPAS)
This paper describes a computationally inexpensive and efficient generic
summarization algorithm for Arabic texts. The algorithm belongs to extractive
summarization family, which reduces the problem into representative sentences
identification and extraction sub-problems. Important keyphrases of the
document to be summarized are identified employing combinations of statistical
and linguistic features. The sentence extraction algorithm exploits keyphrases
as the primary attributes to rank a sentence. The present experimental work,
demonstrates different techniques for achieving various summarization goals
including: informative richness, coverage of both main and auxiliary topics,
and keeping redundancy to a minimum. A scoring scheme is then adopted that
balances between these summarization goals. To evaluate the resulted Arabic
summaries with well-established systems, aligned English/Arabic texts are used
through the experiments.Comment: INFOS 2012, The 8th INFOS2012 International Conference on Informatics
and Systems, 14-16 May, 201
An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization
The fast-growing amount of information on the Internet makes the research in
automatic document summarization very urgent. It is an effective solution for
information overload. Many approaches have been proposed based on different
strategies, such as latent semantic analysis (LSA). However, LSA, when applied
to document summarization, has some limitations which diminish its performance.
In this work, we try to overcome these limitations by applying statistic and
linear algebraic approaches combined with syntactic and semantic processing of
text. First, the part of speech tagger is utilized to reduce the dimension of
LSA. Then, the weight of the term in four adjacent sentences is added to the
weighting schemes while calculating the input matrix to take into account the
word order and the syntactic relations. In addition, a new LSA-based sentence
selection algorithm is proposed, in which the term description is combined with
sentence description for each topic which in turn makes the generated summary
more informative and diverse. To ensure the effectiveness of the proposed
LSA-based sentence selection algorithm, extensive experiment on Arabic and
English are done. Four datasets are used to evaluate the new model, Linguistic
Data Consortium (LDC) Arabic Newswire-a corpus, Essex Arabic Summaries Corpus
(EASC), DUC2002, and Multilingual MSS 2015 dataset. Experimental results on the
four datasets show the effectiveness of the proposed model on Arabic and
English datasets. It performs comprehensively better compared to the
state-of-the-art methods.Comment: This is a pre-print of an article published in Arabian Journal for
Science and Engineering. The final authenticated version is available online
at: https://doi.org/10.1007/s13369-018-3286-
- …