Search CORE

10 research outputs found

Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data

Author: A Januliene
A Nenkova
AH Morris
CD Manning
HP Edmundson
J. Richard Landis
Joel Larocca Neto
PB Baxendale
T Dunning
Publication venue
Publication date: 15/08/2017
Field of study

Usage of online textual media is steadily increasing. Daily, more and more news stories, blog posts and scientific articles are added to the online volumes. These are all freely accessible and have been employed extensively in multiple research areas, e.g. automatic text summarization, information retrieval, information extraction, etc. Meanwhile, online debate forums have recently become popular, but have remained largely unexplored. For this reason, there are no sufficient resources of annotated debate data available for conducting research in this genre. In this paper, we collected and annotated debate data for an automatic summarization task. Similar to extractive gold standard summary generation our data contains sentences worthy to include into a summary. Five human annotators performed this task. Inter-annotator agreement, based on semantic similarity, is 36% for Cohen's kappa and 48% for Krippendorff's alpha. Moreover, we also implement an extractive summarization system for online debates and discuss prominent features for the task of summarizing online debate data automatically.Comment: accepted and presented at the CICLING 2017 - 18th International Conference on Intelligent Text Processing and Computational Linguistic

arXiv.org e-Print Archive

Crossref

The integrated data mining tool MineKit and a case study of its application on video shop data

Author: Freitas Alex Alves
Kaestner Celso A.A.
Neto Joel Larocca
Santos Alexandre D.
Publication venue: ICSC Academic Press
Publication date: 01/07/2000
Field of study

The second goal of this paper is to report the result of evaluating MineKit in a real-world data set. This case study is relevant for data mining mainly for two reasons. First, the original data set, li ke a typical realworld data set, was not previously prepared for data mining activities, so that we had to spent a significant time preparing the data. Hence, we have actuall y gone through the most time-consuming phase of the knowledge discovery process. This issue is usually ignored in the data mining literature, which focus on the data mining phase only

Kent Academic Repository

A Trainable Algorithm for Summarizing News Stories

Author: Freitas Alex A.
Kaestner Celso A.A.
Neto Joel Larocca
Nievola Julio C.
Santos Alexandre D.
Publication venue
Publication date: 01/09/2000
Field of study

This work proposes a trainable system for summarizing news and obtaining an approximate argumentative structure of the source text. To achieve these goals we use several techniques and heuristics, such as detecting the main concepts in the text, connectivity between sentences, occurrence of proper nouns, anaphors, discourse markers and a binary-tree representation (due to the use of an agglomerative clustering algorithm). The proposed system was evaluated on a set of 800 documents

Kent Academic Repository

Automatic text summarization using a machine learning approach

Author: Freitas Alex A.
Kaestner Celso A.A.
Neto Joel Larocca
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/08/2003
Field of study

In this paper we address the automatic summarization task. Recent research works on extractive-summary generation employ some heuristics, but few works indicate how to select the relevant features. We will present a summarization procedure based on the application of trainable Machine Learning algorithms which employs a set of features extracted directly from the original text. These features are of two kinds: statistical - based on the frequency of some elements in the text; and linguistic - extracted from a simplified argumentative structure of the text. We also present some computational results obtained with the application of our summarizer to some well known text databases, and we compare these results to some baseline summarization procedures

Kent Academic Repository

Document Clustering and Text Summarization

Author: Freitas Alex A.
Kaestner Celso A.A.
Neto Joel Larocca
Santos Alexandre D.
Publication venue: The Practical Application Company
Publication date: 01/01/2000
Field of study

This paper describes a text mining tool that performs two tasks, namely document clustering and text summarization. These tasks have, of course, their corresponding counterpart in “conventional” data mining. However, the textual, unstructured nature of documents makes these two text mining tasks considerably more difficult than their data mining counterparts. In our system document clustering is performed by using the Autoclass data mining algorithm. Our text summarization algorithm is based on computing the value of aTF-ISF(term frequency – inverse sentence frequency) measure for each word, which is anadaptation of the conventional TF-IDF (term frequency – inverse document frequency)measure of information retrieval. Sentences with high values of TF-ISF are selected to producea summary of the source text. The system has been evaluated on real-world documents, and the results are satisfactor

Kent Academic Repository

Generating Text Summaries through the Relative Importance of Topics

Author: Alex A. Freitas
Alexandre D. Santos
Celso A. A. Kaestner
Joel Larocca Neto
Re D. Santos
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2000
Field of study

This work proposes a new extractive text-summarization algorithm based on the importance of the topics contained in a document. The basic ideas of the proposed algorithm are as follows. At first the document is partitioned by using the TextTiling algorithm, which identifies topics (coherent segments of text) based on the TF-IDF metric. Then for each topic the algorithm computes a measure of its relative relevance in the document. This measure is computed by using the notion of TF-ISF (Term Frequency - Inverse Sentence Frequency), which is our adaptation of the well-known TF-IDF (Term Frequency - Inverse Document Frequency) measure in information retrieval. Finally, the summary is generated by selecting from each topic a number of sentences proportional to the importance of that topic

CiteSeerX

Crossref

Kent Academic Repository

Document Clustering and Text Summarization

Author: Alex A. Freitas
Alexandre D. Santos
Catolica Parana
Celso A. A
Celso A.A. Kaestner
D. Santos
Joel Larocca Neto
Kaestner Alex
Neto Alexandre
Publication venue
Publication date
Field of study

This paper describes a text mining tool that performs two tasks, namely document clustering and text summarization. These tasks have, of course, their corresponding counterpart in "conventional" data mining. However, the textual, unstructured nature of documents makes these two text mining tasks considerably more difficult than their data mining counterparts. In our system document clustering is performed by using the Autoclass data mining algorithm. Our text summarization algorithm is based on computing the value of a TF-ISF (term frequency -- inverse sentence frequency) measure for each word, which is an adaptation of the conventional TF-IDF (term frequency -- inverse document frequency) measure of information retrieval. Sentences with high values of TF-ISF are selected to produce a summary of the source text. The system has been evaluated on real-world documents, and the results are satisfactory. 1. Introduction Text mining is an emerging field at the intersection of several resea..

CiteSeerX

Unsupervised Method for Text Summarization Using Content Based Approach

Author: Deshpande Anjali
Dr Fazal &amp
Joel Larocca Neto Alex
Josef Steinberger
K Deepali
M Fachrurrozi
Mehdi Allahyari
N Moratanch
Rekha Jain
Richa Sharma
S A Babar
S Chitrakala
Samrat Babar
Yogan Jaya Kumar
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Crossref