44 research outputs found
Summarization of Films and Documentaries Based on Subtitles and Scripts
We assess the performance of generic text summarization algorithms applied to
films and documentaries, using the well-known behavior of summarization of news
articles as reference. We use three datasets: (i) news articles, (ii) film
scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics
are used for comparing generated summaries against news abstracts, plot
summaries, and synopses. We show that the best performing algorithms are LSA,
for news articles and documentaries, and LexRank and Support Sets, for films.
Despite the different nature of films and documentaries, their relative
behavior is in accordance with that obtained for news articles.Comment: 7 pages, 9 tables, 4 figures, submitted to Pattern Recognition
Letters (Elsevier
Enriching very large ontologies using the WWW
This paper explores the possibility to exploit text on the world wide web in
order to enrich the concepts in existing ontologies. First, a method to
retrieve documents from the WWW related to a concept is described. These
document collections are used 1) to construct topic signatures (lists of
topically related words) for each concept in WordNet, and 2) to build
hierarchical clusters of the concepts (the word senses) that lexicalize a given
word. The overall goal is to overcome two shortcomings of WordNet: the lack of
topical links among concepts, and the proliferation of senses. Topic signatures
are validated on a word sense disambiguation task with good results, which are
improved when the hierarchical clusters are used.Comment: 6 page
SemEval-2007 Task 16: evaluation of wide coverage knowledge resources
This task tries to establish the relative quality of available semantic resources (derived by manual or automatic means). The quality of each large-scale knowledge resource is indirectly evaluated on a Word Sense Disambiguation task. In particular, we use Senseval-3 and SemEval-2007 English Lexical Sample tasks as evaluation bechmarks
to evaluate the relative quality of each resource. Furthermore, trying to be as neutral as possible with respect the knowledge bases studied, we apply systematically the same disambiguation method to all the resources. A completely different behaviour is observed on both lexical data sets (Senseval-3 and SemEval-2007).Peer ReviewedPostprint (author’s final draft
Vers une étude comparative diachronique des mondes lexicaux du féminisme
Cet article présente une approche lexicale d'analyse comparative diachronique entre deux corpus traitant du féminisme, sur deux périodes différentes. L'analyse lexicale s'appuie sur la collecte des " mondes lexicaux " (unités lexicales simples et complexes qui sont significativement fréquentes) liés aux deux corpus et sur une analyse comparative de ces mondes lexicaux. Les résultats montrent que les unités lexicales simples sont très proches entre les deux corpus qui traitent de la même thématique, tandis que les unités lexicales complexes sont significativement différentes, car plus spécialisées à une sous-thématique et à une période