Search CORE

28 research outputs found

Machine Learning of Generic and User-Focused Summarization

Author: Bloedorn Eric
Mani Inderjeet
Publication venue
Publication date: 01/01/1998
Field of study

A key problem in text summarization is finding a salience function which determines what information in the source should be included in the summary. This paper describes the use of machine learning on a training corpus of documents and their abstracts to discover salience functions which describe what combination of features is optimal for a given summarization task. The method addresses both "generic" and user-focused summaries.Comment: In Proceedings of the Fifteenth National Conference on AI (AAAI-98), p. 821-82

arXiv.org e-Print Archive

CiteSeerX

Porting a summarizer to the French language

Author: Bois Remi
Goeuriot Lorraine
Jones Gareth J.F.
Kelly Liadh
Leveling Johannes
Publication venue
Publication date: 04/07/2014
Field of study

We describe the porting of the English language REZIME text summarizer to the French language. REZIME is a single-document summarizer particularly focused on summarization of medical documents. Summaries are created by extracting key sentences from the original document. The sentence selection employs machine learning techniques, using statistical, syntactic and lexical features which are computed based on specialized language resources. The REZIME system was initially developed for English documents.In this paper we present the summarizer architecture, and describe the steps required to adapt it to the French language. The summarizer performance is evaluated for English and French datasets. Results show that the adaptation to French results in a system performance comparable to English

Irish Universities

DCU Online Research Access Service

Enumeration of Extractive Oracle Summaries

Author: Hirao Tsutomu
Nagata Masaaki
Nishino Masaaki
Suzuki Jun
Publication venue
Publication date: 01/01/2017
Field of study

To analyze the limitations and the future directions of the extractive summarization paradigm, this paper proposes an Integer Linear Programming (ILP) formulation to obtain extractive oracle summaries in terms of ROUGE-N. We also propose an algorithm that enumerates all of the oracle summaries for a set of reference summaries to exploit F-measures that evaluate which system summaries contain how many sentences that are extracted as an oracle summary. Our experimental results obtained from Document Understanding Conference (DUC) corpora demonstrated the following: (1) room still exists to improve the performance of extractive summarization; (2) the F-measures derived from the enumerated oracle summaries have significantly stronger correlations with human judgment than those derived from single oracle summaries.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Automatic Summarization in Chinese Product Reviews

Author: Du Wan di
Liu Li zhen
Song Wei
Wang Han shi
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/03/2017
Field of study

With the increasing number of online comments, it was hard for buyers to find useful information in a short time so it made sense to do research on automatic summarization which fundamental work was focused on product reviews mining. Previous studies mainly focused on explicit features extraction whereas often ignored implicit features which hadn't been stated clearly but containing necessary information for analyzing comments. So how to quickly and accurately mine features from web reviews had important significance for summarization technology. In this paper, explicit features and “feature-opinion” pairs in the explicit sentences were extracted by Conditional Random Field and implicit product features were recognized by a bipartite graph model based on random walk algorithm. Then incorporating features and corresponding opinions into a structured text and the abstract was generated based on the extraction results. The experiment results demonstrated the proposed methods outpreferred baselines

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

Use of Text Summarization for Supporting Event Detection

Author: Lee Yen-Hsien
Wei Chih-Ping
Wu Pao-Feng
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2004
Field of study

AIS Electronic Library (AISeL)

Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration

Author: Fujii Atsushi
Ishikawa Tetsuya
Publication venue
Publication date: 01/01/2001
Field of study

Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper proposes a Japanese/English CLIR system, where we combine a query translation and retrieval modules. We currently target the retrieval of technical documents, and therefore the performance of our system is highly dependent on the quality of the translation of technical terms. However, the technical term translation is still problematic in that technical terms are often compound words, and thus new terms are progressively created by combining existing base words. In addition, Japanese often represents loanwords based on its special phonogram. Consequently, existing dictionaries find it difficult to achieve sufficient coverage. To counter the first problem, we produce a Japanese/English dictionary for base words, and translate compound words on a word-by-word basis. We also use a probabilistic method to resolve translation ambiguity. For the second problem, we use a transliteration method, which corresponds words unlisted in the base word dictionary to their phonetic equivalents in the target language. We evaluate our system using a test collection for CLIR, and show that both the compound word translation and transliteration methods improve the system performance

arXiv.org e-Print Archive

CiteSeerX

Annotation of Scientific Summaries for Information Retrieval.

Author: Eric Charton
Ibekwe-Sanjuan Fidelia
Sanjuan Eric
Silvia Fernandez
Publication venue: HAL CCSD
Publication date: 30/03/2008
Field of study

International audienceWe present a methodology combining surface NLP and Machine Learning techniques for ranking asbtracts and generating summaries based on annotated corpora. The corpora were annotated with meta-semantic tags indicating the category of information a sentence is bearing (objective, findings, newthing, hypothesis, conclusion, future work, related work). The annotated corpus is fed into an automatic summarizer for query-oriented abstract ranking and multi- abstract summarization. To adapt the summarizer to these two tasks, two novel weighting functions were devised in order to take into account the distribution of the tags in the corpus. Results, although still preliminary, are encouraging us to pursue this line of work and find better ways of building IR systems that can take into account semantic annotations in a corpus

HAL

HAL-Lyon 3

kNNSumm: um sumarizador automático de documentos utilizando aprendizado baseado em instâncias

Author: Junior Carlos N. Silla
Kaestner Celso A. A.
Publication venue
Publication date: 17/10/2012
Field of study

Neste trabalho é apresentada a arquitetura do kNNSumm (k-NN Summarizer), um sumarizador automático de documentos que utiliza o aprendizado de máquina baseado em instâncias. Também são apresentados os resultados obtidos com sua aplicação em uma coleção de documentos em inglês, extraídos da base TIPSTER, que é amplamente utilizada na literatura da área. Além disso, apresenta-se por meio de um exemplo simples e didático o funcionamento detalhado do sumarizador, e de uma forma geral também a tarefa de sumarização quando tratada por uma abordagem de aprendizado de máquina.In this work is presented the architecture of kNNSumm (k-NN Summarizer), an automatic document summarizer based on a instance based machine learning approach. The results achieved by its use on a document collection of english documents extracted from the TIPSTER base which is widely used in the literature are presented also. Additionally, we present a simple and didactic example of the procedures used by the summarizer, and in a more general way the text summarization task with machine learning.Eje: V - Workshop de agentes y sistemas inteligentesRed de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

kNNSumm: um sumarizador automático de documentos utilizando aprendizado baseado em instâncias

Author: Junior Carlos N. Silla
Kaestner Celso A. A.
Publication venue
Publication date: 01/01/2004
Field of study