Selecting Generative Models for Networks using Classification with Machine Learning

Abstract

By representing data entities as a map of edges and vertices, where each edge encodes a relationship between two vertices, networks have an almost unlimited ability to capture relationships and patterns impossible to see with the human eye. Because these patterns often reflect key aspects of the data, a significant portion of network science is devoted to detecting and distinguishing networks by using these topological features. The use of machine learning for classifying networks is a popular solution; research in this area includes techniques ranging from k-Nearest Neighbors to language modeling-inspired deep learning methods. Another area of interest with respect to networks is model selection, which can provide unique insights into a graph’s topological and probabilistic properties. This thesis combines the two areas of network classification with machine learning and generative model selection by using the popular algorithm known as “random forests” as a potential model selection criterion. First, we perform a series of experiments designed to characterize the discriminatory power of random forests on a wide variety of synthetic graphs generated by dozens of Stochastic Block Models (SBMs). Then, we take advantage of well-known network structural properties and compare the generative model of best fit selected by random forests to the model chosen by a previously established selection criterion known as Integrated Completed Likelihood (ICL). In applying these techniques to selecting Erdos-Renyi mixture models for a macaque brain connectivity dataset and using the model that maximizes the ICL criterion as the “gold standard,” we observed that random forests serves as a comparable model selection method when using topological network statistics as the feature space, selecting the same best-fit model chosen by ICL over 95% of the time.Bachelor of Scienc

    Similar works