93 research outputs found
Tweet Contextualization Based on Wikipedia and Dbpedia
National audienceBound to 140 characters, tweets are short and not written maintaining formal grammar and proper spelling. These spelling variations increase the likelihood of vocabulary mismatch and make them difficult to understand without context. This paper falls under the tweet contextualization task that aims at providing, automatically, a summary that explains a given tweet, allowing a reader to understand it. We propose different tweet expansion approaches based on Wikipeda and Dbpedia as external knowledge sources. These proposed approaches are divided into two steps. The first step consists in generating the candidate terms for a given tweet, while the second one consists in ranking and selecting these candidate terms using asimilarity measure. The effectiveness of our methods is proved through an experimental study conducted on the INEX 2014 collection
Model Selection in Summary Evaluation
A difficulty in the design of automated text summarization algorithms is in the objective evaluation. Viewing summarization as a tradeoff between length and information content, we introduce a technique based on a hierarchy of classifiers to rank, through model selection, different summarization methods. This summary evaluation technique allows for broader comparison of summarization methods than the traditional techniques of summary evaluation. We present an empirical study of two simple, albeit widely used, summarization methods that shows the different usages of this automated task-based evaluation system and confirms the results obtained with human-based evaluation methods over smaller corpora
Jointly Extracting and Compressing Documents with Summary State Representations
We present a new neural model for text summarization that first extracts
sentences from a document and then compresses them. The proposed model offers a
balance that sidesteps the difficulties in abstractive methods while generating
more concise summaries than extractive methods. In addition, our model
dynamically determines the length of the output summary based on the gold
summaries it observes during training and does not require length constraints
typical to extractive summarization. The model achieves state-of-the-art
results on the CNN/DailyMail and Newsroom datasets, improving over current
extractive and abstractive methods. Human evaluations demonstrate that our
model generates concise and informative summaries. We also make available a new
dataset of oracle compressive summaries derived automatically from the
CNN/DailyMail reference summaries
Automatic Generation of Text Summaries - Challenges, proposals and experiments
Los estudiantes e investigadores en el área de procesamiento deenguaje natural, inteligencia artificial, ciencias computacionales y lingüÃstica computacional serán quizá los primeros interesados en este libro. No obstante, también se pretende introducir a público no especializado en esta prometedora área de investigación; por ello, hemos traducido al español algunos tecnicismos y anglicismos, propios de esta disciplina, pero sin dejar de mencionar, en todo momento, su término en inglés para evitar confusiones y lograr que aquellos lectores interesados puedan ampliar sus fuentes de conocimiento.Este libro presenta un método computacional novedoso, a nivel internacional, para la generación automática de resúmenes de texto, pues supera la calidad de los que actualmente se pueden crear. Es decir, es resultado de una investigación que buscó métodos y modelos computacionales lo menos dependientes del lenguaje y dominio
Using Online Reference in Poetry Analysis (a Cultural Studies Approach to Teaching Poetry)
Menerapkan pendekatan studi budaya (cultural studies) dalam mengajar Sastra adalah menggabungkan text sastra dengan text lainnya yang memiliki hubungan budaya. Tulisan ini membahas penelahaan puisi dengan menggunakan referensi online sebagai text pendamping. Strategi membaca intertext ini dapat menjadi alternatif baru untuk pengajaran puisi yang selama ini terjebak pada aspek bahasa dan stilistika saja.Pengayaan text dengan memakai text pendamping mempertajam penggalian makna dan analisa tema. Dalam tulisan ini Puisi Immigrant yang ditulis oleh penyair Amerika Pat Mora dianalisa dengan menggunakan referensi online yang diupload dari Washington Post web sebagai text pendamping. Isu dibawa dalam dua text ini membawa isu budaya yang sama yaitu Konflik budaya dalam proses assimilasi Immigrant di Amerika. Dengan menggandeng Referensi online serta mengaplikasikan strategi pembelajaran studi budaya, pembelajaran puisi dapat dilakukan dalam konteks yang lebih variatif dan interdisipliner
Automatic text summarization in digital libraries
xiii, 142 leaves ; 28 cm.A digital library is a collection of services and information objects for storing, accessing, and retrieving digital objects. Automatic text summarization presents salient information in a condensed form suitable for user needs. This thesis amalgamates digital libraries and automatic text summarization by extending the Greenstone Digital Library software suite to include the University of Lethbridge Summarizer. The tool generates summaries, nouns, and non phrases for use as metadata for searching and browsing digital collections. Digital collections of newspapers, PDFs, and eBooks were created with summary metadata. PDF documents were processed the fastest at 1.8 MB/hr, followed by the newspapers at 1.3 MB/hr, with eBooks being the slowest at 0.9 MV/hr. Qualitative analysis on four genres: newspaper, M.Sc. thesis, novel, and poetry, revealed narrative newspapers were most suitable for automatically generated summarization. The other genres suffered from incoherence and information loss. Overall, summaries for digital collections are suitable when used with newspaper documents and unsuitable for other genres
Report on first selection of resources
The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Peer ReviewedPreprin
- …