2,643 research outputs found

    Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

    Full text link
    In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.Comment: 22 pages, 12 figures, 9 table

    Enumeration of Extractive Oracle Summaries

    Full text link
    To analyze the limitations and the future directions of the extractive summarization paradigm, this paper proposes an Integer Linear Programming (ILP) formulation to obtain extractive oracle summaries in terms of ROUGE-N. We also propose an algorithm that enumerates all of the oracle summaries for a set of reference summaries to exploit F-measures that evaluate which system summaries contain how many sentences that are extracted as an oracle summary. Our experimental results obtained from Document Understanding Conference (DUC) corpora demonstrated the following: (1) room still exists to improve the performance of extractive summarization; (2) the F-measures derived from the enumerated oracle summaries have significantly stronger correlations with human judgment than those derived from single oracle summaries.Comment: 12 page

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Text Summarization Techniques: A Brief Survey

    Get PDF
    In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.Comment: Some of references format have update

    From media crossing to media mining

    Get PDF
    This paper reviews how the concept of Media Crossing has contributed to the advancement of the application domain of information access and explores directions for a future research agenda. These will include themes that could help to broaden the scope and to incorporate the concept of medium-crossing in a more general approach that not only uses combinations of medium-specific processing, but that also exploits more abstract medium-independent representations, partly based on the foundational work on statistical language models for information retrieval. Three examples of successful applications of media crossing will be presented, with a focus on the aspects that could be considered a first step towards a generalized form of media mining

    Definition of Requirements for Accessing Multilingual Information and Opinions

    Get PDF
    With the development of the Internet and satellite television, access to thousands of programs and messages in different languages became widespread. Unfortunately, even well educated people do not speak sufficiently in more than two or three foreign languages, while most know only one, and this significantly limits the access to this information. In this paper, we define requirements for an automated system for Accessing Multilingual Information and opinionS (AMIS) that will help in the understanding of multimedia content transmitted in different languages, with simultaneous comparison to counterparts in their native language user. The concept of understanding we use will provide access to any information, regardless of the language in which it is presented. We believe that the AMIS project can have a immense and positive impact on the integration and awareness of society in social and cultural terms
    corecore