44 research outputs found

    Summarization of Films and Documentaries Based on Subtitles and Scripts

    Get PDF
    We assess the performance of generic text summarization algorithms applied to films and documentaries, using the well-known behavior of summarization of news articles as reference. We use three datasets: (i) news articles, (ii) film scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics are used for comparing generated summaries against news abstracts, plot summaries, and synopses. We show that the best performing algorithms are LSA, for news articles and documentaries, and LexRank and Support Sets, for films. Despite the different nature of films and documentaries, their relative behavior is in accordance with that obtained for news articles.Comment: 7 pages, 9 tables, 4 figures, submitted to Pattern Recognition Letters (Elsevier

    Enriching very large ontologies using the WWW

    Full text link
    This paper explores the possibility to exploit text on the world wide web in order to enrich the concepts in existing ontologies. First, a method to retrieve documents from the WWW related to a concept is described. These document collections are used 1) to construct topic signatures (lists of topically related words) for each concept in WordNet, and 2) to build hierarchical clusters of the concepts (the word senses) that lexicalize a given word. The overall goal is to overcome two shortcomings of WordNet: the lack of topical links among concepts, and the proliferation of senses. Topic signatures are validated on a word sense disambiguation task with good results, which are improved when the hierarchical clusters are used.Comment: 6 page

    SemEval-2007 Task 16: evaluation of wide coverage knowledge resources

    Get PDF
    This task tries to establish the relative quality of available semantic resources (derived by manual or automatic means). The quality of each large-scale knowledge resource is indirectly evaluated on a Word Sense Disambiguation task. In particular, we use Senseval-3 and SemEval-2007 English Lexical Sample tasks as evaluation bechmarks to evaluate the relative quality of each resource. Furthermore, trying to be as neutral as possible with respect the knowledge bases studied, we apply systematically the same disambiguation method to all the resources. A completely different behaviour is observed on both lexical data sets (Senseval-3 and SemEval-2007).Peer ReviewedPostprint (author’s final draft

    Vers une étude comparative diachronique des mondes lexicaux du féminisme

    No full text
    Cet article présente une approche lexicale d'analyse comparative diachronique entre deux corpus traitant du féminisme, sur deux périodes différentes. L'analyse lexicale s'appuie sur la collecte des " mondes lexicaux " (unités lexicales simples et complexes qui sont significativement fréquentes) liés aux deux corpus et sur une analyse comparative de ces mondes lexicaux. Les résultats montrent que les unités lexicales simples sont très proches entre les deux corpus qui traitent de la même thématique, tandis que les unités lexicales complexes sont significativement différentes, car plus spécialisées à une sous-thématique et à une période
    corecore