93 research outputs found

    Tweet Contextualization Based on Wikipedia and Dbpedia

    No full text
    National audienceBound to 140 characters, tweets are short and not written maintaining formal grammar and proper spelling. These spelling variations increase the likelihood of vocabulary mismatch and make them difficult to understand without context. This paper falls under the tweet contextualization task that aims at providing, automatically, a summary that explains a given tweet, allowing a reader to understand it. We propose different tweet expansion approaches based on Wikipeda and Dbpedia as external knowledge sources. These proposed approaches are divided into two steps. The first step consists in generating the candidate terms for a given tweet, while the second one consists in ranking and selecting these candidate terms using asimilarity measure. The effectiveness of our methods is proved through an experimental study conducted on the INEX 2014 collection

    Model Selection in Summary Evaluation

    Get PDF
    A difficulty in the design of automated text summarization algorithms is in the objective evaluation. Viewing summarization as a tradeoff between length and information content, we introduce a technique based on a hierarchy of classifiers to rank, through model selection, different summarization methods. This summary evaluation technique allows for broader comparison of summarization methods than the traditional techniques of summary evaluation. We present an empirical study of two simple, albeit widely used, summarization methods that shows the different usages of this automated task-based evaluation system and confirms the results obtained with human-based evaluation methods over smaller corpora

    Jointly Extracting and Compressing Documents with Summary State Representations

    Get PDF
    We present a new neural model for text summarization that first extracts sentences from a document and then compresses them. The proposed model offers a balance that sidesteps the difficulties in abstractive methods while generating more concise summaries than extractive methods. In addition, our model dynamically determines the length of the output summary based on the gold summaries it observes during training and does not require length constraints typical to extractive summarization. The model achieves state-of-the-art results on the CNN/DailyMail and Newsroom datasets, improving over current extractive and abstractive methods. Human evaluations demonstrate that our model generates concise and informative summaries. We also make available a new dataset of oracle compressive summaries derived automatically from the CNN/DailyMail reference summaries

    Automatic Generation of Text Summaries - Challenges, proposals and experiments

    Get PDF
    Los estudiantes e investigadores en el área de procesamiento deenguaje natural, inteligencia artificial, ciencias computacionales y lingüística computacional serán quizá los primeros interesados en este libro. No obstante, también se pretende introducir a público no especializado en esta prometedora área de investigación; por ello, hemos traducido al español algunos tecnicismos y anglicismos, propios de esta disciplina, pero sin dejar de mencionar, en todo momento, su término en inglés para evitar confusiones y lograr que aquellos lectores interesados puedan ampliar sus fuentes de conocimiento.Este libro presenta un método computacional novedoso, a nivel internacional, para la generación automática de resúmenes de texto, pues supera la calidad de los que actualmente se pueden crear. Es decir, es resultado de una investigación que buscó métodos y modelos computacionales lo menos dependientes del lenguaje y dominio

    Using Online Reference in Poetry Analysis (a Cultural Studies Approach to Teaching Poetry)

    Full text link
    Menerapkan pendekatan studi budaya (cultural studies) dalam mengajar Sastra adalah menggabungkan text sastra dengan text lainnya yang memiliki hubungan budaya. Tulisan ini membahas penelahaan puisi dengan menggunakan referensi online sebagai text pendamping. Strategi membaca intertext ini dapat menjadi alternatif baru untuk pengajaran puisi yang selama ini terjebak pada aspek bahasa dan stilistika saja.Pengayaan text dengan memakai text pendamping mempertajam penggalian makna dan analisa tema. Dalam tulisan ini Puisi Immigrant yang ditulis oleh penyair Amerika Pat Mora dianalisa dengan menggunakan referensi online yang diupload dari Washington Post web sebagai text pendamping. Isu dibawa dalam dua text ini membawa isu budaya yang sama yaitu Konflik budaya dalam proses assimilasi Immigrant di Amerika. Dengan menggandeng Referensi online serta mengaplikasikan strategi pembelajaran studi budaya, pembelajaran puisi dapat dilakukan dalam konteks yang lebih variatif dan interdisipliner

    Automatic text summarization in digital libraries

    Get PDF
    xiii, 142 leaves ; 28 cm.A digital library is a collection of services and information objects for storing, accessing, and retrieving digital objects. Automatic text summarization presents salient information in a condensed form suitable for user needs. This thesis amalgamates digital libraries and automatic text summarization by extending the Greenstone Digital Library software suite to include the University of Lethbridge Summarizer. The tool generates summaries, nouns, and non phrases for use as metadata for searching and browsing digital collections. Digital collections of newspapers, PDFs, and eBooks were created with summary metadata. PDF documents were processed the fastest at 1.8 MB/hr, followed by the newspapers at 1.3 MB/hr, with eBooks being the slowest at 0.9 MV/hr. Qualitative analysis on four genres: newspaper, M.Sc. thesis, novel, and poetry, revealed narrative newspapers were most suitable for automatically generated summarization. The other genres suffered from incoherence and information loss. Overall, summaries for digital collections are suitable when used with newspaper documents and unsuitable for other genres

    Report on first selection of resources

    Get PDF
    The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Peer ReviewedPreprin
    • …
    corecore