9 research outputs found

    The integrated data mining tool MineKit and a case study of its application on video shop data

    Get PDF
    The second goal of this paper is to report the result of evaluating MineKit in a real-world data set. This case study is relevant for data mining mainly for two reasons. First, the original data set, li ke a typical realworld data set, was not previously prepared for data mining activities, so that we had to spent a significant time preparing the data. Hence, we have actuall y gone through the most time-consuming phase of the knowledge discovery process. This issue is usually ignored in the data mining literature, which focus on the data mining phase only

    A Trainable Algorithm for Summarizing News Stories

    Get PDF
    This work proposes a trainable system for summarizing news and obtaining an approximate argumentative structure of the source text. To achieve these goals we use several techniques and heuristics, such as detecting the main concepts in the text, connectivity between sentences, occurrence of proper nouns, anaphors, discourse markers and a binary-tree representation (due to the use of an agglomerative clustering algorithm). The proposed system was evaluated on a set of 800 documents

    Automatic text summarization using a machine learning approach

    No full text
    In this paper we address the automatic summarization task. Recent research works on extractive-summary generation employ some heuristics, but few works indicate how to select the relevant features. We will present a summarization procedure based on the application of trainable Machine Learning algorithms which employs a set of features extracted directly from the original text. These features are of two kinds: statistical - based on the frequency of some elements in the text; and linguistic - extracted from a simplified argumentative structure of the text. We also present some computational results obtained with the application of our summarizer to some well known text databases, and we compare these results to some baseline summarization procedures

    Document Clustering and Text Summarization

    No full text
    This paper describes a text mining tool that performs two tasks, namely document clustering and text summarization. These tasks have, of course, their corresponding counterpart in “conventional” data mining. However, the textual, unstructured nature of documents makes these two text mining tasks considerably more difficult than their data mining counterparts. In our system document clustering is performed by using the Autoclass data mining algorithm. Our text summarization algorithm is based on computing the value of aTF-ISF(term frequency – inverse sentence frequency) measure for each word, which is anadaptation of the conventional TF-IDF (term frequency – inverse document frequency)measure of information retrieval. Sentences with high values of TF-ISF are selected to producea summary of the source text. The system has been evaluated on real-world documents, and the results are satisfactor

    Generating Text Summaries through the Relative Importance of Topics

    No full text
    This work proposes a new extractive text-summarization algorithm based on the importance of the topics contained in a document. The basic ideas of the proposed algorithm are as follows. At first the document is partitioned by using the TextTiling algorithm, which identifies topics (coherent segments of text) based on the TF-IDF metric. Then for each topic the algorithm computes a measure of its relative relevance in the document. This measure is computed by using the notion of TF-ISF (Term Frequency - Inverse Sentence Frequency), which is our adaptation of the well-known TF-IDF (Term Frequency - Inverse Document Frequency) measure in information retrieval. Finally, the summary is generated by selecting from each topic a number of sentences proportional to the importance of that topic

    Document Clustering and Text Summarization

    No full text
    This paper describes a text mining tool that performs two tasks, namely document clustering and text summarization. These tasks have, of course, their corresponding counterpart in "conventional" data mining. However, the textual, unstructured nature of documents makes these two text mining tasks considerably more difficult than their data mining counterparts. In our system document clustering is performed by using the Autoclass data mining algorithm. Our text summarization algorithm is based on computing the value of a TF-ISF (term frequency -- inverse sentence frequency) measure for each word, which is an adaptation of the conventional TF-IDF (term frequency -- inverse document frequency) measure of information retrieval. Sentences with high values of TF-ISF are selected to produce a summary of the source text. The system has been evaluated on real-world documents, and the results are satisfactory. 1. Introduction Text mining is an emerging field at the intersection of several resea..
    corecore