Search CORE

7 research outputs found

Regression model focused on query for multi documents summarization based on significance of the sentence position

Author: Arhandi Putra Prima
Fanani Aris
Farida Yuniar
Hidayat M. Mahaputra
Montolalu Billy
Muhid Abdul
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/12/2019
Field of study

Document summarization is needed to get the information effectively and efficiently. One method used to obtain the document summarization by applying machine learning techniques. This paper proposes the application of regression models to query-focused multi-document summarization based on the significance of the sentence position. The method used is the Support Vector Regression (SVR) which estimates the weight of the sentence on a set of documents to be made as a summary based on sentence feature which has been defined previously. A series of evaluations performed on a data set of DUC 2005. From the test results obtained summary which has an average precision and recall values of 0.0580 and 0.0590 for measurements using ROUGE-2, ROUGE 0.0997 and 0.1019 for measurements using the proposed regression-SU4. Model can perform measurements of the significance of the position of the sentence in the document well

TELKOMNIKA (Telecommunication Computing Electronics and Control)

Personal Text Summarization in Mobile Device

Author: Alaa Kadhim
Publication venue: Unviversity of Technology- Iraq
Publication date: 01/03/2012
Field of study

This paper presents a hybrid text summarization for mobile device to summarize a selected text. The system can be proceeds by statistic or heuristic methods. With the statistic and heuristic the summary is found based on combined statistic features and heuristic features like word frequency, position, length of sentences, and similarity with the document title. The results shows that the time with proposed system is less than without it during the retrieving the text with selected keywords

Directory of Open Access Journals

Automatic Text Summarization based on Word-Clusters and Ranking Algorithms

Author: C. Fellbaum
K. Taghva
M.J. Symons
Y. Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

International audienceThis paper investigates a new approach for Single Document Summarization based on a Machine Learning ranking algorithm. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting text-spans (sentences in our case) and adopt the classification framework which consists to train a classifier in order to discriminate between relevant and irrelevant spans of a document. A set of features is first used to produce a vector of scores for each sentence in a given document and a classifier is trained in order to make a global combination of these scores. We believe that the classification criterion for training a classifier is not adapted for SDS and propose an original framework based on ranking for this task. A ranking algorithm also combines the scores of different features but its criterion tends to reduce the relative misordering of sentences within a document. Features we use here are either based on the state-of-the-art or built upon word-clusters. These clusters are groups of words which often co-occur with each other, and can serve to expand a query or to enrich the representation of the sentences of the documents. We analyze the performance of our ranking algorithm on two data sets – the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC and the WIPO collection. We perform comparisons with different baseline – non learning – systems, and a reference trainable summarizer system based on the classification framework. The experiments show that the learning algorithms perform better than the non-learning systems while the ranking algorithm outperforms the classifier. The difference of performance between the two learning algorithms depends on the nature of datasets. We give an explanation of this fact by the different separability hypothesis of the data made by the two learning algorithms

Crossref

Recommended from our members

A computer-assisted qualitative data analysis framework for the engineering management domain

Author: Saeedi Amirali
Publication venue: 'Oregon State University'
Publication date
Field of study

Qualitative data analysis (QDA) is a time consuming and, potentially unreliable research activity. In qualitative research, a number of tasks must be repeated for every new research case, even if each case is closely related or is in the same area of study. Existing QDA applications provide users with a variety of tools and features that assist researchers in manipulating qualitative data. There is a great advantage in using these functions over completing these tasks manually. However, available QDA tools are not really more than a digital paper and pencil. In other words, existing tools are not equipped with any sort of automatic processing features. A computer assisted framework was developed to help researchers in conducting qualitative data analysis. This framework leveraged the GATE platform, along with Natural Language Processing and Knowledge Extraction, to develop an automatic text annotation and summarization system. A performance model, developed from the literature on lean manufacturing implementation strategies was converted to an ontology. A lexicon database for lean implementation practices was also developed. A dataset from a previous research study focusing on lean implementation practices was used to conduct this development and testing. A number of different summarization techniques were developed and tested. A customized sensitivity analysis method was developed and used to systematically perform summarization algorithms comparisons. For the best summarization algorithm, an average F-score of 0.6567 was recorded. This F-score was based on a recall of 0.85 and a precision of 0.55, demonstrating the feasibility of automatic processing on an unstructured qualitative dataset

ScholarsArchive@OSU

Improving the Performance of Text Summarization

Author: Mohammadreza Vali Zadeh
Publication venue
Publication date: 19/12/2014
Field of study

Repositório Aberto da Universidade do Porto

Semantic Based Content Search and Content Summarization

Author: Mamakis Georgios
Publication venue
Publication date: 01/10/2012
Field of study

University of South Wales Research Explorer

Effective summarisation for search engines

Author: Leal Bando L
Publication venue: RMIT University
Publication date: 01/01/2013
Field of study

Users of information retrieval (IR) systems issue queries to find information in large collections of documents. Nearly all IR systems return answers in the form of a list of results, where each entry typically consists of the title of the underlying document, a link, and a short query-biased summary of a document's content called a snippet. As retrieval systems typically return a mixture of relevant and non-relevant answers, the role of the snippet is to guide users to identify those documents that are likely to be good answers and to ignore those that are less useful. This thesis focuses on techniques to improve the generation and evaluation of query-biased summaries for informational requests, where users typically need to inspect several documents to fulfil their information needs. We investigate the following issues: how users construct query-biased summaries, and how this compares with current automatic summarisation methods; how query expansion can be applied to sentence-level ranking to improve the quality of query-biased summaries; and, how to evaluate these summarisation approaches using sentence-level relevance data. First, through an eye tracking study, we investigate the way in which users select information from documents when they are asked to construct a query-biased summary in response to a given search request. Our analysis indicates that user behaviour differs from the assumptions of current state-of-the-art query-biased summarisation approaches. A major cause of difference resulted from vocabulary mismatch, a common IR problem. This thesis then examines query expansion techniques to improve the selection of candidate relevant sentences, and to reduce the vocabulary mismatch observed in the previous study. We employ a Cranfield-based methodology to quantitatively assess sentence ranking methods based on sentence-level relevance assessments available in the TREC Novelty track, in line with previous work. We study two aspects of sentence-level evaluation of this track. First, whether sentences that have been judged based on relevance, as in the TREC Novelty track, can also be considered to be indicative; that is, useful in terms of being part of a query-biased summary and guiding users to make correct document selections. By conducting a crowdsourcing experiment, we find that relevance and indicativeness agree around 73% of the time. Second, during our evaluations we discovered a bias that longer sentences were more likely to be judged as relevant. We then propose a novel evaluation of sentence ranking methods, which aims to isolate the sentence length bias. Using our enhanced evaluation method, we find that query expansion can effectively assist in the selection of short sentences. We conclude our investigation with a second study to examine the effectiveness of query expansion in query-biased summarisation methods to end users. Our results indicate that participants significantly tend to prefer query-biased summaries aided through expansion techniques approximately 60% of the time, for query-biased summaries comprised of short and middle length sentences. We suggest that our findings can inform the generation and display of query-biased summaries of IR systems such as search engines

RMIT Research Repository