129 research outputs found

    Flow-based Influence Graph Visual Summarization

    Full text link
    Visually mining a large influence graph is appealing yet challenging. People are amazed by pictures of newscasting graph on Twitter, engaged by hidden citation networks in academics, nevertheless often troubled by the unpleasant readability of the underlying visualization. Existing summarization methods enhance the graph visualization with blocked views, but have adverse effect on the latent influence structure. How can we visually summarize a large graph to maximize influence flows? In particular, how can we illustrate the impact of an individual node through the summarization? Can we maintain the appealing graph metaphor while preserving both the overall influence pattern and fine readability? To answer these questions, we first formally define the influence graph summarization problem. Second, we propose an end-to-end framework to solve the new problem. Our method can not only highlight the flow-based influence patterns in the visual summarization, but also inherently support rich graph attributes. Last, we present a theoretic analysis and report our experiment results. Both evidences demonstrate that our framework can effectively approximate the proposed influence graph summarization objective while outperforming previous methods in a typical scenario of visually mining academic citation networks.Comment: to appear in IEEE International Conference on Data Mining (ICDM), Shen Zhen, China, December 201

    A new graph based text segmentation using Wikipedia for automatic text summarization

    Get PDF
    Two models have been developed for simulating CO₂ emissions from wheat farms: (1) an artificial neural network (ANN) model; and (2) a multiple linear regression model (MLR). Data were collected from 40 wheat farms in the Canterbury region of New Zealand. Investigation of more than 140 various factors enabled the selection of eight factors to be employed as the independent variables for final the ANN model. The results showed the final ANN developed can forecast CO₂ emissions from wheat production areas under different conditions (proportion of wheat cultivated land on the farm, numbers of irrigation applications and numbers of cows), the condition of machinery (tractor power index (hp/ha) and age of fertilizer spreader) and N, P and insecticide inputs on the farms with an accuracy of ±11% (± 113 kg CO₂/ha). The total CO₂ emissions from farm inputs were estimated as 1032 kg CO₂/ha for wheat production. On average, fertilizer use of 52% and fuel use of around 20% have the highest CO₂ emissions for wheat cultivation. The results confirmed the ANN model forecast CO₂ emissions much better than MLR model

    HunSum-1 : an abstractive summarization dataset for Hungarian

    Get PDF
    We introduce HunSum-1 : a dataset for Hungarian abstractive summarization, consisting of 1.14M news articles. The dataset is built by collecting, cleaning and deduplicating data from 9 major Hungarian news sites through CommonCrawl. Using this dataset, we build abstractive summarizer models based on huBERT and mT5. We demonstrate the value of the created dataset by performing a quantitative and qualitative analysis on the models’ results. The HunSum-1 dataset, all models used in our experiments and our code1 are available open source

    Document Summarization and Classification using Concept and Context Similarity Analysis

    Get PDF
    ABSTRACT: "Document summarization and classification using concept and context similarity analysis'' deals with an information retrieval task, which aims at extracting a condensed version of the original document. A document summary is useful since it can give an overview of the original document in a shorter period of time. The main goal of a summary is to present the main ideas in a document/set of documents in a short and readable paragraph. Classification is a data mining function that assigns items in a collection to target categories of the documents. Context sensitive document indexing model based on the Bernoulli model of randomness is used for document summarization process. The lexical association between terms is used to produce a context sensitive weight to the document terms. The context sensitive indexing weights are used to compute the sentence similarity matrix and as a result, the sentences are presented in such a way that the most informative sentences appear on the top of the summary, making a positive impact on the quality of the summar

    Extractive Automatic Text Summarization Based on Lexical-Semantic Keywords

    Get PDF
    The automatic text summarization (ATS) task consists in automatically synthesizing a document to provide a condensed version of it. Creating a summary requires not only selecting the main topics of the sentences but also identifying the key relationships between these topics. Related works rank text units (mainly sentences) to select those that could form the summary. However, the resulting summaries may not include all the topics covered in the source text because important information may have been discarded. In addition, the semantic structure of documents has been barely explored in this field. Thus, this study proposes a new method for the ATS task that takes advantage of semantic information to improve keyword detection. This proposed method increases not only the coverage by clustering the sentences to identify the main topics in the source document but also the precision by detecting the keywords in the clusters. The experimental results of this work indicate that the proposed method outperformed previous methods with a standard collection

    Audiovisual Media Annotation Using Qualitative Data Analysis Software: A Comparative Analysis

    Get PDF
    The variety of specialized tools designed to facilitate analysis of audio-visual (AV) media are useful not only to media scholars and oral historians but to other researchers as well. Both Qualitative Data Analysis Software (QDAS) packages and dedicated systems created for specific disciplines, such as linguistics, can be used for this purpose. Software proliferation challenges researchers to make informed choices about which package will be most useful for their project. This paper aims to present an information science perspective of the scholarly use of tools in qualitative research of audio-visual sources. It provides a baseline of affordances based on functionalities with the goal of making the types of research tasks that they support more explicit (e.g., transcribing, segmenting, coding, linking, and commenting on data). We look closely at how these functionalities relate to each other, and at how system design influences research tasks
    corecore