744 research outputs found

    Key Phrase Extraction of Lightly Filtered Broadcast News

    Get PDF
    This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applied to Broadcast News (BN). Key phrases are words and expressions that best characterize the content of a document. Key phrases are often used to index the document or as features in further processing. This makes improvements in AKE accuracy particularly important. We hypothesized that filtering out marginally relevant sentences from a document would improve AKE accuracy. Our experiments confirmed this hypothesis. Elimination of as little as 10% of the document sentences lead to a 2% improvement in AKE precision and recall. AKE is built over MAUI toolkit that follows a supervised learning approach. We trained and tested our AKE method on a gold standard made of 8 BN programs containing 110 manually annotated news stories. The experiments were conducted within a Multimedia Monitoring Solution (MMS) system for TV and radio news/programs, running daily, and monitoring 12 TV and 4 radio channels.Comment: In 15th International Conference on Text, Speech and Dialogue (TSD 2012

    Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

    Full text link
    In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.Comment: 22 pages, 12 figures, 9 table

    Automatic Summarization

    Get PDF
    It has now been 50 years since the publication of Luhn’s seminal paper on automatic summarization. During these years the practical need for automatic summarization has become increasingly urgent and numerous papers have been published on the topic. As a result, it has become harder to find a single reference that gives an overview of past efforts or a complete view of summarization tasks and necessary system components. This article attempts to fill this void by providing a comprehensive overview of research in summarization, including the more traditional efforts in sentence extraction as well as the most novel recent approaches for determining important content, for domain and genre specific summarization and for evaluation of summarization. We also discuss the challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field

    Analysis of Competitor Intelligence in the Era of Big Data: An Integrated System Using Text Summarization Based on Global Optimization

    Get PDF
    Automatic text summarization can be applied to extract summaries from competitor intelligence (CI) corpora that organizations create by gathering textual data from the Internet. Such a representation of CI text is easier for managers to interpret and use for making decisions. This research investigates design of an integrated system for CI analysis which comprises clustering and automatic text summarization and evaluates quality of extractive summaries generated automatically by various text-summarization techniques based on global optimization. This research is conducted using experimentation and empirical analysis of results. A survey of practicing managers is also carried out to understand the effectiveness of automatically generated summaries from CI perspective. Firstly, it shows that global optimization-based techniques generate good quality extractive summaries for CI analysis from topical clusters created by the clustering step of the integrated system. Secondly, it shows the usefulness of the generated summaries by having them evaluated by practicing managers from CI perspective. Finally, the implication of this research from the point of view of theory and practice is discussed
    corecore