28 research outputs found

    Machine Learning of Generic and User-Focused Summarization

    Full text link
    A key problem in text summarization is finding a salience function which determines what information in the source should be included in the summary. This paper describes the use of machine learning on a training corpus of documents and their abstracts to discover salience functions which describe what combination of features is optimal for a given summarization task. The method addresses both "generic" and user-focused summaries.Comment: In Proceedings of the Fifteenth National Conference on AI (AAAI-98), p. 821-82

    Porting a summarizer to the French language

    Get PDF
    We describe the porting of the English language REZIME text summarizer to the French language. REZIME is a single-document summarizer particularly focused on summarization of medical documents. Summaries are created by extracting key sentences from the original document. The sentence selection employs machine learning techniques, using statistical, syntactic and lexical features which are computed based on specialized language resources. The REZIME system was initially developed for English documents.In this paper we present the summarizer architecture, and describe the steps required to adapt it to the French language. The summarizer performance is evaluated for English and French datasets. Results show that the adaptation to French results in a system performance comparable to English

    Enumeration of Extractive Oracle Summaries

    Full text link
    To analyze the limitations and the future directions of the extractive summarization paradigm, this paper proposes an Integer Linear Programming (ILP) formulation to obtain extractive oracle summaries in terms of ROUGE-N. We also propose an algorithm that enumerates all of the oracle summaries for a set of reference summaries to exploit F-measures that evaluate which system summaries contain how many sentences that are extracted as an oracle summary. Our experimental results obtained from Document Understanding Conference (DUC) corpora demonstrated the following: (1) room still exists to improve the performance of extractive summarization; (2) the F-measures derived from the enumerated oracle summaries have significantly stronger correlations with human judgment than those derived from single oracle summaries.Comment: 12 page

    Automatic Summarization in Chinese Product Reviews

    Get PDF
    With the increasing number of online comments, it was hard for buyers to find useful information in a short time so it made sense to do research on automatic summarization which fundamental work was focused on product reviews mining. Previous studies mainly focused on explicit features extraction whereas often ignored implicit features which hadn't been stated clearly but containing necessary information for analyzing comments. So how to quickly and accurately mine features from web reviews had important significance for summarization technology. In this paper, explicit features and “feature-opinion” pairs in the explicit sentences were extracted by Conditional Random Field and implicit product features were recognized by a bipartite graph model based on random walk algorithm. Then incorporating features and corresponding opinions into a structured text and the abstract was generated based on the extraction results. The experiment results demonstrated the proposed methods outpreferred baselines

    Use of Text Summarization for Supporting Event Detection

    Get PDF

    Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration

    Full text link
    Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper proposes a Japanese/English CLIR system, where we combine a query translation and retrieval modules. We currently target the retrieval of technical documents, and therefore the performance of our system is highly dependent on the quality of the translation of technical terms. However, the technical term translation is still problematic in that technical terms are often compound words, and thus new terms are progressively created by combining existing base words. In addition, Japanese often represents loanwords based on its special phonogram. Consequently, existing dictionaries find it difficult to achieve sufficient coverage. To counter the first problem, we produce a Japanese/English dictionary for base words, and translate compound words on a word-by-word basis. We also use a probabilistic method to resolve translation ambiguity. For the second problem, we use a transliteration method, which corresponds words unlisted in the base word dictionary to their phonetic equivalents in the target language. We evaluate our system using a test collection for CLIR, and show that both the compound word translation and transliteration methods improve the system performance

    Annotation of Scientific Summaries for Information Retrieval.

    Get PDF
    International audienceWe present a methodology combining surface NLP and Machine Learning techniques for ranking asbtracts and generating summaries based on annotated corpora. The corpora were annotated with meta-semantic tags indicating the category of information a sentence is bearing (objective, findings, newthing, hypothesis, conclusion, future work, related work). The annotated corpus is fed into an automatic summarizer for query-oriented abstract ranking and multi- abstract summarization. To adapt the summarizer to these two tasks, two novel weighting functions were devised in order to take into account the distribution of the tags in the corpus. Results, although still preliminary, are encouraging us to pursue this line of work and find better ways of building IR systems that can take into account semantic annotations in a corpus

    kNNSumm: um sumarizador automático de documentos utilizando aprendizado baseado em instâncias

    Get PDF
    Neste trabalho é apresentada a arquitetura do kNNSumm (k-NN Summarizer), um sumarizador automático de documentos que utiliza o aprendizado de máquina baseado em instâncias. Também são apresentados os resultados obtidos com sua aplicação em uma coleção de documentos em inglês, extraídos da base TIPSTER, que é amplamente utilizada na literatura da área. Além disso, apresenta-se por meio de um exemplo simples e didático o funcionamento detalhado do sumarizador, e de uma forma geral também a tarefa de sumarização quando tratada por uma abordagem de aprendizado de máquina.In this work is presented the architecture of kNNSumm (k-NN Summarizer), an automatic document summarizer based on a instance based machine learning approach. The results achieved by its use on a document collection of english documents extracted from the TIPSTER base which is widely used in the literature are presented also. Additionally, we present a simple and didactic example of the procedures used by the summarizer, and in a more general way the text summarization task with machine learning.Eje: V - Workshop de agentes y sistemas inteligentesRed de Universidades con Carreras en Informática (RedUNCI

    kNNSumm: um sumarizador automático de documentos utilizando aprendizado baseado em instâncias

    Get PDF
    Neste trabalho é apresentada a arquitetura do kNNSumm (k-NN Summarizer), um sumarizador automático de documentos que utiliza o aprendizado de máquina baseado em instâncias. Também são apresentados os resultados obtidos com sua aplicação em uma coleção de documentos em inglês, extraídos da base TIPSTER, que é amplamente utilizada na literatura da área. Além disso, apresenta-se por meio de um exemplo simples e didático o funcionamento detalhado do sumarizador, e de uma forma geral também a tarefa de sumarização quando tratada por uma abordagem de aprendizado de máquina.In this work is presented the architecture of kNNSumm (k-NN Summarizer), an automatic document summarizer based on a instance based machine learning approach. The results achieved by its use on a document collection of english documents extracted from the TIPSTER base which is widely used in the literature are presented also. Additionally, we present a simple and didactic example of the procedures used by the summarizer, and in a more general way the text summarization task with machine learning.Eje: V - Workshop de agentes y sistemas inteligentesRed de Universidades con Carreras en Informática (RedUNCI