103,132 research outputs found

    Flow-based Influence Graph Visual Summarization

    Full text link
    Visually mining a large influence graph is appealing yet challenging. People are amazed by pictures of newscasting graph on Twitter, engaged by hidden citation networks in academics, nevertheless often troubled by the unpleasant readability of the underlying visualization. Existing summarization methods enhance the graph visualization with blocked views, but have adverse effect on the latent influence structure. How can we visually summarize a large graph to maximize influence flows? In particular, how can we illustrate the impact of an individual node through the summarization? Can we maintain the appealing graph metaphor while preserving both the overall influence pattern and fine readability? To answer these questions, we first formally define the influence graph summarization problem. Second, we propose an end-to-end framework to solve the new problem. Our method can not only highlight the flow-based influence patterns in the visual summarization, but also inherently support rich graph attributes. Last, we present a theoretic analysis and report our experiment results. Both evidences demonstrate that our framework can effectively approximate the proposed influence graph summarization objective while outperforming previous methods in a typical scenario of visually mining academic citation networks.Comment: to appear in IEEE International Conference on Data Mining (ICDM), Shen Zhen, China, December 201

    Unsupervised Detection of Emergent Patterns in Large Image Collections

    Get PDF
    With the advent of modern image acquisition and sharing technologies, billions of images are added to the Internet every day. This huge repository contains useful information, but it is very hard to analyze. If labeled information is available for this data, then supervised learning techniques can be used to extract useful information. Visual pattern mining approaches provide a way to discover visual structures and patterns in an image collection without the need of any supervision. The Internet contains images of various objects, scenes, patterns, and shapes. The majority of approaches for visual pattern discovery, on the other hand, find patterns that are related to object or scene categories.Emergent pattern mining techniques provide a way to extract generic, complex and hidden structures in images. This thesis describes research, experiments, and analysis conducted to explore various approaches to mine emergent patterns from image collections in an unsupervised way. These approaches are based on itemset mining and graph theoretic strategies. The itemset mining strategy uses frequent itemset mining and rare itemset mining techniques to discover patterns.The mining is performed on a transactional dataset which is obtained from the BoW representation of images. The graph-based approach represents visual word co-occurrences obtained from images in a co-occurrence graph.Emergent patterns form dense clusters in this graph that are extracted using normalized cuts. The patterns that are discovered using itemset mining approaches are:stripes and parallel lines;dots and checks;bright dots;single lines;intersections; and frames. The graph based approach revealed various interesting patterns, including some patterns that are related to object categories

    Node re-ordering as a means of anomaly detection in time-evolving graphs

    Full text link
    © Springer International Publishing AG 2016. Anomaly detection is a vital task for maintaining and improving any dynamic system. In this paper, we address the problem of anomaly detection in time-evolving graphs, where graphs are a natural representation for data in many types of applications. A key challenge in this context is how to process large volumes of streaming graphs. We propose a pre-processing step before running any further analysis on the data, where we permute the rows and columns of the adjacency matrix. This pre-processing step expedites graph mining techniques such as anomaly detection, PageRank, or graph coloring. In this paper, we focus on detecting anomalies in a sequence of graphs based on rank correlations of the reordered nodes. The merits of our approach lie in its simplicity and resilience to challenges such as unsupervised input, large volumes and high velocities of data. We evaluate the scalability and accuracy of our method on real graphs, where our method facilitates graph processing while producing more deterministic orderings. We show that the proposed approach is capable of revealing anomalies in a more efficient manner based on node rankings. Furthermore, our method can produce visual representations of graphs that are useful for graph compression

    GriMa: a Grid Mining Algorithm for Bag-of-Grid-Based Classification

    No full text
    International audienceGeneral-purpose exhaustive graph mining algorithms have seldom been used in real life contexts due to the high complexity of the process that is mostly based on costly isomorphism tests and countless expansion possibilities. In this paper, we explain how to exploit grid-based representations of problems to efficiently extract frequent grid subgraphs and create Bag-of-Grids which can be used as new features for classification purposes. We provide an efficient grid mining algorithm called GriMA which is designed to scale to large amount of data. We apply our algorithm on image classification problems where typical Bag-of-Visual-Words-based techniques are used. However, those techniques make use of limited spatial information in the image which could be beneficial to obtain more discriminative features. Experiments on different datasets show that our algorithm is efficient and that adding the structure may greatly help the image classification process

    Data Driven Approach To Saltwater Disposal (SWD) Well Location Optimization In North Dakota

    Get PDF
    The sharp increase in oil and gas production in the Williston Basin of North Dakota since 2006 has resulted in a significant increase in produced water volumes. Primary mechanism for disposal of produced water is by injection into underground Inyan Kara formation through Class-II Saltwater Disposal (SWD) wells. With number of SWD wells anticipated to increase from 900 to over 1400 by 2035, localized pressurization and other potential issues that could affect performance of future oil and SWD wells, there was a need for a reliable model to select locations of future SWD wells for optimum performance. Since it is uncommon to develop traditional geological and simulation models for SWD wells, this research focused on developing data-driven proxy models based on the CRISP-Data Mining pipeline for understanding SWD well performance and optimizing future well locations. NDIC’s oil and gas division was identified as the primary data source. Significant efforts went towards identifying other secondary data sources, extracting required data from primary and secondary data sources using web scraping, integrating different data types including spatial data and creating the final data set. Orange visual programming application and Python programming language were used to carry out the required data mining activities. Exploratory Data Analysis and clustering analysis were used to gain a good understanding of the features in the data set and their relationships. Graph Data Science techniques such as Knowledge Graphs and graph-based clustering were used to gain further insights. Machine Learning regression algorithms such as Multi-Linear Regression, k-Nearest Neighbors and Random Forest were used to train machine learning models to predict average monthly barrels of saltwater disposed in a well. Model performance was optimized using the RMSE metric and the Random Forest model was selected as the final model for deployment to predict performance of a planned SWD well. A multi-target regression model was trained using deep neural network to predict water production in oil and gas wells drilled in the McKenzie county of North Dakota

    Evolution of Business Intelligence: An Analysis from the Perspective of Social Network

    Get PDF
    Based on CiteSpace, Pajek and other software, this paper makes a visual analysis of the knowledge graph of the related literature of Business Intelligence and explores the future development trend of business intelligence. Taking the core periodicals of CNKI as the data source, key words are drawn and analyzed with the help of software. The total number of articles was 2938 from 2006 to 2020, and the number of articles published in the past 15 years was gradually levelled off. Among the 607 researchers, Yang Bingru is the representative; there are 424 journals, Journal of Information is the first, and 787 keywords are the most frequently used data mining. Our country still needs in-depth research in the field of business intelligence. Through the atlas, it directly shows that big data and machine learning are the frontier hot spots of future development, which provides research direction for researchers

    Novel graph analytics for enhancing data insight

    No full text
    Graph analytics is a fast growing and significant field in the visualization and data mining community, which is applied on numerous high-impact applications such as, network security, finance, and health care, providing users with adequate knowledge across various patterns within a given system. Although a series of methods have been developed in the past years for the analysis of unstructured collections of multi-dimensional points, graph analytics has only recently been explored. Despite the significant progress that has been achieved recently, there are still many open issues in the area, concerning not only the performance of the graph mining algorithms, but also producing effective graph visualizations in order to enhance human perception. The current thesis deals with the investigation of novel methods for graph analytics, in order to enhance data insight. Towards this direction, the current thesis proposes two methods so as to perform graph mining and visualization. Based on previous works related to graph mining, the current thesis suggests a set of novel graph features that are particularly efficient in identifying the behavioral patterns of the nodes on the graph. The specific features proposed, are able to capture the interaction of the neighborhoods with other nodes on the graph. Moreover, unlike previous approaches, the graph features introduced herein, include information from multiple node neighborhood sizes, thus capture long-range correlations between the nodes, and are able to depict the behavioral aspects of each node with high accuracy. Experimental evaluation on multiple datasets, shows that the use of the proposed graph features for the graph mining procedure, provides better results than the use of other state-of-the-art graph features. Thereafter, the focus is laid on the improvement of graph visualization methods towards enhanced human insight. In order to achieve this, the current thesis uses non-linear deformations so as to reduce visual clutter. Non-linear deformations have been previously used to magnify significant/cluttered regions in data or images for reducing clutter and enhancing the perception of patterns. Extending previous approaches, this work introduces a hierarchical approach for non-linear deformation that aims to reduce visual clutter by magnifying significant regions, and leading to enhanced visualizations of one/two/three-dimensional datasets. In this context, an energy function is utilized, which aims to determine the optimal deformation for every local region in the data, taking the information from multiple single-layer significance maps into consideration. The problem is subsequently transformed into an optimization problem for the minimization of the energy function under specific spatial constraints. Extended experimental evaluation provides evidence that the proposed hierarchical approach for the generation of the significance map surpasses current methods, and manages to effectively identify significant regions and deliver better results. The thesis is concluded with a discussion outlining the major achievements of the current work, as well as some possible drawbacks and other open issues of the proposed approaches that could be addressed in future works.Open Acces

    Mining and analysis of real-world graphs

    Get PDF
    Networked systems are everywhere - such as the Internet, social networks, biological networks, transportation networks, power grid networks, etc. They can be very large yet enormously complex. They can contain a lot of information, either open and transparent or under the cover and coded. Such real-world systems can be modeled using graphs and be mined and analyzed through the lens of network analysis. Network analysis can be applied in recognition of frequent patterns among the connected components in a large graph, such as social networks, where visual analysis is almost impossible. Frequent patterns illuminate statistically important subgraphs that are usually small enough to analyze visually. Graph mining has different practical applications in fraud detection, outliers detection, chemical molecules, etc., based on the necessity of extracting and understanding the information yielded. Network analysis can also be used to quantitatively evaluate and improve the resilience of infrastructure networks such as the Internet or power grids. Infrastructure networks directly affect the quality of people\u27s lives. However, a disastrous incident in these networks may lead to a cascading breakdown of the whole network and serious economic consequences. In essence, network analysis can help us gain actionable insights and make better data-driven decisions based on the networks. On that note, the objective of this dissertation is to improve upon existing tools for more accurate mining and analysis of real-world networks --Abstract, page iv

    SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation

    Full text link
    Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc. However, most of the existing works have ignored the crucial fact regarding the scarcity of labeled data. With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain and thus making data annotation a tedious task. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches which use text mining and textual labels, we use a complete vision-based approach in pre-training without any ground-truth label or its derivative. Instead, we generate pseudo-layouts from the document images to pre-train an image encoder to learn the document object representation and localization in a self-supervised framework before fine-tuning it with an object detection model. We show that our pipeline sets a new benchmark in this context and performs at par with the existing methods and the supervised counterparts, if not outperforms. The code is made publicly available at: https://github.com/MaitySubhajit/SelfDocSegComment: Accepted at The 17th International Conference on Document Analysis and Recognition (ICDAR 2023
    corecore