28 research outputs found
Machine Learning of Generic and User-Focused Summarization
A key problem in text summarization is finding a salience function which
determines what information in the source should be included in the summary.
This paper describes the use of machine learning on a training corpus of
documents and their abstracts to discover salience functions which describe
what combination of features is optimal for a given summarization task. The
method addresses both "generic" and user-focused summaries.Comment: In Proceedings of the Fifteenth National Conference on AI (AAAI-98),
p. 821-82
Porting a summarizer to the French language
We describe the porting of the English language REZIME text summarizer to the French language. REZIME
is a single-document summarizer particularly focused on summarization of medical documents. Summaries are
created by extracting key sentences from the original document. The sentence selection employs machine learning techniques,
using statistical, syntactic and lexical features which are computed based on specialized language resources. The
REZIME system was initially developed for English documents.In this paper we present the summarizer architecture, and
describe the steps required to adapt it to the French language. The summarizer performance is evaluated for English and
French datasets. Results show that the adaptation to French results in a system performance comparable to English
Enumeration of Extractive Oracle Summaries
To analyze the limitations and the future directions of the extractive
summarization paradigm, this paper proposes an Integer Linear Programming (ILP)
formulation to obtain extractive oracle summaries in terms of ROUGE-N. We also
propose an algorithm that enumerates all of the oracle summaries for a set of
reference summaries to exploit F-measures that evaluate which system summaries
contain how many sentences that are extracted as an oracle summary. Our
experimental results obtained from Document Understanding Conference (DUC)
corpora demonstrated the following: (1) room still exists to improve the
performance of extractive summarization; (2) the F-measures derived from the
enumerated oracle summaries have significantly stronger correlations with human
judgment than those derived from single oracle summaries.Comment: 12 page
Automatic Summarization in Chinese Product Reviews
With the increasing number of online comments, it was hard for buyers to find useful information in a short time so it made sense to do research on automatic summarization which fundamental work was focused on product reviews mining. Previous studies mainly focused on explicit features extraction whereas often ignored implicit features which hadn't been stated clearly but containing necessary information for analyzing comments. So how to quickly and accurately mine features from web reviews had important significance for summarization technology. In this paper, explicit features and “feature-opinion” pairs in the explicit sentences were extracted by Conditional Random Field and implicit product features were recognized by a bipartite graph model based on random walk algorithm. Then incorporating features and corresponding opinions into a structured text and the abstract was generated based on the extraction results. The experiment results demonstrated the proposed methods outpreferred baselines
Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration
Cross-language information retrieval (CLIR), where queries and documents are
in different languages, has of late become one of the major topics within the
information retrieval community. This paper proposes a Japanese/English CLIR
system, where we combine a query translation and retrieval modules. We
currently target the retrieval of technical documents, and therefore the
performance of our system is highly dependent on the quality of the translation
of technical terms. However, the technical term translation is still
problematic in that technical terms are often compound words, and thus new
terms are progressively created by combining existing base words. In addition,
Japanese often represents loanwords based on its special phonogram.
Consequently, existing dictionaries find it difficult to achieve sufficient
coverage. To counter the first problem, we produce a Japanese/English
dictionary for base words, and translate compound words on a word-by-word
basis. We also use a probabilistic method to resolve translation ambiguity. For
the second problem, we use a transliteration method, which corresponds words
unlisted in the base word dictionary to their phonetic equivalents in the
target language. We evaluate our system using a test collection for CLIR, and
show that both the compound word translation and transliteration methods
improve the system performance
Annotation of Scientific Summaries for Information Retrieval.
International audienceWe present a methodology combining surface NLP and Machine Learning techniques for ranking asbtracts and generating summaries based on annotated corpora. The corpora were annotated with meta-semantic tags indicating the category of information a sentence is bearing (objective, findings, newthing, hypothesis, conclusion, future work, related work). The annotated corpus is fed into an automatic summarizer for query-oriented abstract ranking and multi- abstract summarization. To adapt the summarizer to these two tasks, two novel weighting functions were devised in order to take into account the distribution of the tags in the corpus. Results, although still preliminary, are encouraging us to pursue this line of work and find better ways of building IR systems that can take into account semantic annotations in a corpus
kNNSumm: um sumarizador automático de documentos utilizando aprendizado baseado em instâncias
Neste trabalho é apresentada a arquitetura do kNNSumm (k-NN Summarizer), um sumarizador automático de documentos que utiliza o aprendizado de máquina baseado em instâncias. Também são apresentados os resultados obtidos com sua aplicação em uma coleção de documentos em inglês, extraídos da base TIPSTER, que é amplamente utilizada na literatura da área. Além disso, apresenta-se por meio de um exemplo simples e didático o funcionamento detalhado do sumarizador, e de uma forma geral também a tarefa de sumarização quando tratada por uma abordagem de aprendizado de máquina.In this work is presented the architecture of kNNSumm (k-NN Summarizer), an automatic document summarizer based on a instance based machine learning approach. The results achieved by its use on a document collection of english documents extracted from the TIPSTER base which is widely used in the literature are presented also. Additionally, we present a simple and didactic example of the procedures used by the summarizer, and in a more general way the text summarization task with machine learning.Eje: V - Workshop de agentes y sistemas inteligentesRed de Universidades con Carreras en Informática (RedUNCI
kNNSumm: um sumarizador automático de documentos utilizando aprendizado baseado em instâncias
Neste trabalho é apresentada a arquitetura do kNNSumm (k-NN Summarizer), um sumarizador automático de documentos que utiliza o aprendizado de máquina baseado em instâncias. Também são apresentados os resultados obtidos com sua aplicação em uma coleção de documentos em inglês, extraídos da base TIPSTER, que é amplamente utilizada na literatura da área. Além disso, apresenta-se por meio de um exemplo simples e didático o funcionamento detalhado do sumarizador, e de uma forma geral também a tarefa de sumarização quando tratada por uma abordagem de aprendizado de máquina.In this work is presented the architecture of kNNSumm (k-NN Summarizer), an automatic document summarizer based on a instance based machine learning approach. The results achieved by its use on a document collection of english documents extracted from the TIPSTER base which is widely used in the literature are presented also. Additionally, we present a simple and didactic example of the procedures used by the summarizer, and in a more general way the text summarization task with machine learning.Eje: V - Workshop de agentes y sistemas inteligentesRed de Universidades con Carreras en Informática (RedUNCI