6 research outputs found

    Botnet Detection Using Graph Based Feature Clustering

    Get PDF
    Detecting botnets in a network is crucial because bot-activities impact numerous areas such as security, finance, health care, and law enforcement. Most existing rule and flow-based detection methods may not be capable of detecting bot-activities in an efficient manner. Hence, designing a robust botnet-detection method is of high significance. In this study, we propose a botnet-detection methodology based on graph-based features. Self-Organizing Map is applied to establish the clusters of nodes in the network based on these features. Our method is capable of isolating bots in small clusters while containing most normal nodes in the big-clusters. A filtering procedure is also developed to further enhance the algorithm efficiency by removing inactive nodes from bot detection. The methodology is verified using real-world CTU-13 and ISCX botnet datasets and benchmarked against classification-based detection methods. The results show that our proposed method can efficiently detect the bots despite their varying behaviors

    Graph based Anomaly Detection and Description: A Survey

    Get PDF
    Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

    Novel graph analytics for enhancing data insight

    No full text
    Graph analytics is a fast growing and significant field in the visualization and data mining community, which is applied on numerous high-impact applications such as, network security, finance, and health care, providing users with adequate knowledge across various patterns within a given system. Although a series of methods have been developed in the past years for the analysis of unstructured collections of multi-dimensional points, graph analytics has only recently been explored. Despite the significant progress that has been achieved recently, there are still many open issues in the area, concerning not only the performance of the graph mining algorithms, but also producing effective graph visualizations in order to enhance human perception. The current thesis deals with the investigation of novel methods for graph analytics, in order to enhance data insight. Towards this direction, the current thesis proposes two methods so as to perform graph mining and visualization. Based on previous works related to graph mining, the current thesis suggests a set of novel graph features that are particularly efficient in identifying the behavioral patterns of the nodes on the graph. The specific features proposed, are able to capture the interaction of the neighborhoods with other nodes on the graph. Moreover, unlike previous approaches, the graph features introduced herein, include information from multiple node neighborhood sizes, thus capture long-range correlations between the nodes, and are able to depict the behavioral aspects of each node with high accuracy. Experimental evaluation on multiple datasets, shows that the use of the proposed graph features for the graph mining procedure, provides better results than the use of other state-of-the-art graph features. Thereafter, the focus is laid on the improvement of graph visualization methods towards enhanced human insight. In order to achieve this, the current thesis uses non-linear deformations so as to reduce visual clutter. Non-linear deformations have been previously used to magnify significant/cluttered regions in data or images for reducing clutter and enhancing the perception of patterns. Extending previous approaches, this work introduces a hierarchical approach for non-linear deformation that aims to reduce visual clutter by magnifying significant regions, and leading to enhanced visualizations of one/two/three-dimensional datasets. In this context, an energy function is utilized, which aims to determine the optimal deformation for every local region in the data, taking the information from multiple single-layer significance maps into consideration. The problem is subsequently transformed into an optimization problem for the minimization of the energy function under specific spatial constraints. Extended experimental evaluation provides evidence that the proposed hierarchical approach for the generation of the significance map surpasses current methods, and manages to effectively identify significant regions and deliver better results. The thesis is concluded with a discussion outlining the major achievements of the current work, as well as some possible drawbacks and other open issues of the proposed approaches that could be addressed in future works.Open Acces

    Data Mining Meets HCI: Making Sense of Large Graphs

    Full text link

    Analyzing and Processing Big Real Graphs

    Get PDF
    As fundamental abstractions of network structures, graphs are everywhere, ranging from biological protein interaction networks and Internet routing networks, to emerging online social networks. Studying graphs is critical to understanding the fundamental processes behind the networks, and of practical importance in experimental research. Although many studies on graphs have been carried out in decades, most of the work focused on small or synthetic graphs. In recent years, because of the unprecedented increase of existing networks and the emergence of new complex networks, more and more big real graphs are becoming available. Compared to the graphs studied in prior work, the graphs from these networks are significantly different in scale, level of dynamics and structure.In this dissertation, we tackle three important graph research problems caused by the significant differences of the big real graphs: efficient node distance computation, graph dynamic analysis and modeling, and graph privacy.First, we target on a fundamental graph analysis problem, i.e. node distance computation. As a primitive of graph analysis and network applications, the computation of shortest path or random walk distances is computationally expensive, and difficult to scale with the sheer size of big real graphs. To address the scalability issue, we design a novel node distance computation method, named graph coordinate systems, to efficiently estimate node distances with high accuracy.Our second work is to understand and model the dynamic processes in big real graphs. Specifically, we propose methods to analyze graph dynamics at multiple network scales and explore temporal properties of network growth. Through measurements on Renren first two-year dynamic data, we find independent and predictable processes at different network levels, and detect self-similar properties in its edge creation process. Based on the observations, we propose a new dynamic graph model to capture both temporal and spatial properties. Calibrated with the Renren dataset, our model successfully produces synthetic graphs showing similar dynamic properties.Finally, to address privacy issue in sharing graphs, we design a graph privacy system to guarantee the required level of privacy. The goal of our work is to design a system that can both maintain a meaningful graph structure and provide strong privacy guarantee. To navigate the tradeoff between the strength of privacy and graph structure utility, we propose a differentially-private graph model. Our rigorous proof shows that the graphs produced by the system can achieve the required level of privacy. By running the system on real graphs collected from Facebook, Internet, and Web, the results demonstrate that the generated synthetic graphs match the original graphs in terms of graph structural metrics and application-level performance.In summary, to analyze and process the graphs from today's large complex networks, we work on three important problems, including efficiently computing node distances in massive graphs, analyzing and modeling high volume of dynamics in big real graphs, and protecting graph privacy in sharing graphs. We propose novel solutions to address these problems. Through our extensive experiments, we show that our designs perform consistently well on big real graphs
    corecore