52,659 research outputs found

    Network Sampling: From Static to Streaming Graphs

    Full text link
    Network sampling is integral to the analysis of social, information, and biological networks. Since many real-world networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorough and complete understanding of network sampling is critical to support the field of network science. In this paper, we outline a framework for the general problem of network sampling, by highlighting the different objectives, population and units of interest, and classes of network sampling methods. In addition, we propose a spectrum of computational models for network sampling methods, ranging from the traditionally studied model based on the assumption of a static domain to a more challenging model that is appropriate for streaming domains. We design a family of sampling methods based on the concept of graph induction that generalize across the full spectrum of computational models (from static to streaming) while efficiently preserving many of the topological properties of the input graphs. Furthermore, we demonstrate how traditional static sampling algorithms can be modified for graph streams for each of the three main classes of sampling methods: node, edge, and topology-based sampling. Our experimental results indicate that our proposed family of sampling methods more accurately preserves the underlying properties of the graph for both static and streaming graphs. Finally, we study the impact of network sampling algorithms on the parameter estimation and performance evaluation of relational classification algorithms

    Scalable Methods and Algorithms for Very Large Graphs Based on Sampling

    Get PDF
    Analyzing real-life networks is a computationally intensive task due to the sheer size of networks. Direct analysis is even impossible when the network data is not entirely accessible. For instance, user networks in Twitter and Facebook are not available for third parties to explore their properties directly. Thus, sampling-based algorithms are indispensable. This dissertation addresses the conļ¬dence interval (CI) and bias problems in real-world network analysis. It uses estimations of the number of triangles (hereafter āˆ†) and clustering coefficient (hereafter C) as a case study. Metric āˆ† in a graph is an important measurement for understanding the graph. It is also directly related to C in a graph, which is one of the most important indicators for social networks. The methods proposed in this dissertation can be utilized in other sampling problems. First, we proposed two new methods to estimate āˆ† based on random edge sampling in both streaming and non-streaming models. These methods outperformed the state-of-the-art methods consistently and could be better by orders of magnitude when the graph is very large. More importantly, we proved the improvement ratio analytically and veriļ¬ed our result extensively in real-world networks. The analytical results were achieved by simplifying the variances of the estimators based on the assumption that the graph is very large. We believe that such big data assumption can lead to interesting results not only in triangle estimation but also in other sampling problems. Next, we studied the estimation of C in both streaming and non-streaming sampling models. Despite numerous algorithms proposed in this area, the bias and variance of the estimators remain an open problem. We quantiļ¬ed the bias using Taylor expansion and found that the bias can be determined by the structure of the sampled data. Based on the understanding of the bias, we gave new estimators that correct the bias. The results were derived analytically and veriļ¬ed in 56 real networks ranging in diļ¬€erent sizes and structures. The experiments reveal that the bias ranges widely from data to data. The relative bias can be as high as 4% in non-streaming model and 2% in streaming model, or it can be negative. We also derived the variances of the estimators, and the estimators for the variances. Our simpliļ¬ed estimators can be used in practice to control the accuracy level of estimations

    Statistical Analysis of Networks

    Get PDF
    This book is a general introduction to the statistical analysis of networks, and can serve both as a research monograph and as a textbook. Numerous fundamental tools and concepts needed for the analysis of networks are presented, such as network modeling, community detection, graph-based semi-supervised learning and sampling in networks. The description of these concepts is self-contained, with both theoretical justifications and applications provided for the presented algorithms. Researchers, including postgraduate students, working in the area of network science, complex network analysis, or social network analysis, will find up-to-date statistical methods relevant to their research tasks. This book can also serve as textbook material for courses related to the statistical approach to the analysis of complex networks. In general, the chapters are fairly independent and self-supporting, and the book could be used for course composition ā€œĆ  la carteā€. Nevertheless, Chapter 2 is needed to a certain degree for all parts of the book. It is also recommended to read Chapter 4 before reading Chapters 5 and 6, but this is not absolutely necessary. Reading Chapter 3 can also be helpful before reading Chapters 5 and 7. As prerequisites for reading this book, a basic knowledge in probability, linear algebra and elementary notions of graph theory is advised. Appendices describing required notions from the above mentioned disciplines have been added to help readers gain further understanding

    Statistical Analysis of Networks

    Get PDF
    This book is a general introduction to the statistical analysis of networks, and can serve both as a research monograph and as a textbook. Numerous fundamental tools and concepts needed for the analysis of networks are presented, such as network modeling, community detection, graph-based semi-supervised learning and sampling in networks. The description of these concepts is self-contained, with both theoretical justifications and applications provided for the presented algorithms. Researchers, including postgraduate students, working in the area of network science, complex network analysis, or social network analysis, will find up-to-date statistical methods relevant to their research tasks. This book can also serve as textbook material for courses related to the statistical approach to the analysis of complex networks. In general, the chapters are fairly independent and self-supporting, and the book could be used for course composition ā€œĆ  la carteā€. Nevertheless, Chapter 2 is needed to a certain degree for all parts of the book. It is also recommended to read Chapter 4 before reading Chapters 5 and 6, but this is not absolutely necessary. Reading Chapter 3 can also be helpful before reading Chapters 5 and 7. As prerequisites for reading this book, a basic knowledge in probability, linear algebra and elementary notions of graph theory is advised. Appendices describing required notions from the above mentioned disciplines have been added to help readers gain further understanding

    Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

    Full text link
    In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201

    Computing Vertex Centrality Measures in Massive Real Networks with a Neural Learning Model

    Full text link
    Vertex centrality measures are a multi-purpose analysis tool, commonly used in many application environments to retrieve information and unveil knowledge from the graphs and network structural properties. However, the algorithms of such metrics are expensive in terms of computational resources when running real-time applications or massive real world networks. Thus, approximation techniques have been developed and used to compute the measures in such scenarios. In this paper, we demonstrate and analyze the use of neural network learning algorithms to tackle such task and compare their performance in terms of solution quality and computation time with other techniques from the literature. Our work offers several contributions. We highlight both the pros and cons of approximating centralities though neural learning. By empirical means and statistics, we then show that the regression model generated with a feedforward neural networks trained by the Levenberg-Marquardt algorithm is not only the best option considering computational resources, but also achieves the best solution quality for relevant applications and large-scale networks. Keywords: Vertex Centrality Measures, Neural Networks, Complex Network Models, Machine Learning, Regression ModelComment: 8 pages, 5 tables, 2 figures, version accepted at IJCNN 2018. arXiv admin note: text overlap with arXiv:1810.1176
    • ā€¦
    corecore