84 research outputs found

    Essays in the economics and econometrics of networks and peer effect

    Get PDF
    Defence date: 23 May 2023Examining Board: Prof. Andrea Ichino, (European University Institute, supervisor); Prof. Sule Alan, (European University Institute, co-supervisor); Prof. Eric Auerbach, (Northwestern University); Prof. Yann Bramoullé, (Aix-Marseille School of Economics)This thesis contributes to the understanding of peer effects, both methodologically and empirically. The endogeneity of network formation has been a major obstacle to the study of peer influence. The first and the second chapters of the thesis propose a causal identification solution in the potential outcome framework. Combining results from multiple causal inference and statistical network analysis, I show that confounding can be addressed by inferring propensity scores of network link formation from the adjacency matrix. This identification strategy imposes minimum restrictions on the data-generating process and, unlike existing econometric solutions, does not rely on any parametric modelling. As an application, I estimate the effect of high school friendships on bachelor’s degree attainment. While previous literature finds that exposure to more high-achieving boys makes girls less likely to obtain a bachelor’s degree, I show that if the girls consider the boys as friends, their interactions induce a positive impact instead. Since friendship endogeneity has been addressed, the estimated effect is causal. The third chapter looks at the peer effects generated by group competition. It focuses on the gender differences in preference for competition in a setting where the competition does not involve face-to-face confrontation, and effort is the only determinant of the final ranking. I first develop a model of group competition with heterogeneous preference for ranking. With empirical implications generated from the theoretical model, I then test the gender difference in the preference parameter using web-scraped data from Duolingo, a free online foreign-language learning platform with over 300 million users. Every week, language learners on Duolingo are randomly allocated to groups of 30 people to compete on the number of language lessons completed during that week. The empirical results suggest in this setting, females have a stronger preference for ranking than males.1. The linking effect: causal identification and estimation of the effect of peer relationship -- 2. Extensions, theoretical proofs, and additional results on the linking effect -- 3. Gender difference in preference for competition -- 4. Reference

    Neural function approximation on graphs: shape modelling, graph discrimination & compression

    Get PDF
    Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces

    Healing failures and improving generalization in deep generative modelling

    Get PDF
    Deep generative modeling is a crucial and rapidly developing area of machine learning, with numerous potential applications, including data generation, anomaly detection, data compression, and more. Despite the significant empirical success of many generative models, some limitations still need to be addressed to improve their performance in certain cases. This thesis focuses on understanding the limitations of generative modeling in common scenarios and proposes corresponding techniques to alleviate these limitations and improve performance in practical generative modeling applications. Specifically, the thesis is divided into two sub-topics: one focusing on the training and the other on the generalization of generative models. A brief introduction to each sub-topic is provided below. Generative models are typically trained by optimizing their fit to the data distribution. This is achieved by minimizing a statistical divergence between the model and data distributions. However, there are cases where these divergences fail to accurately capture the differences between the model and data distributions, resulting in poor performance of the trained model. In the first part of the thesis, we discuss the two situations where the classic divergences are ineffective for training the models: 1. KL divergence fails to train implicit models for manifold modeling tasks. 2. Fisher divergence cannot distinguish the mixture proportions for modeling target multi-modality distribution. For both failure modes, we investigate the theoretical reasons underlying the failures of KL and Fisher divergences in modelling certain types of data distributions. We propose techniques that address the limitations of these divergences, enabling more reliable estimation of the underlying data distributions. While the generalization of classification or regression models has been extensively studied in machine learning, the generalization of generative models is a relatively under-explored area. In the second part of this thesis, we aim to address this gap by investigating the generalization properties of generative models. Specifically, we investigate two generalization scenarios: 1. In-distribution (ID) generalization of probabilistic models, where the test data and the training data are from the same distribution. 2. Out-of-distribution (OOD) generalization of probabilistic models, where the test data and the training data can come from different distributions. In the context of ID generalization, our emphasis rests on the Variational Auto-Encoder (VAE) model, and for OOD generalization, we primarily explore autoregressive models. By studying the generalization properties of the models, we demonstrate how to design new models or training criteria that improve the performance of practical applications, such as lossless compression and OOD detection. The findings of this thesis shed light on the intricate challenges faced by generative models in both training and generalization scenarios. Our investigations into the inefficacies of classic divergences like KL and Fisher highlight the importance of tailoring modeling techniques to the specific characteristics of data distributions. Additionally, by delving into the generalization aspects of generative models, this work pioneers insights into the ID and OOD scenarios, a domain not extensively covered in current literature. Collectively, the insights and techniques presented in this thesis provide valuable contributions to the community, fostering an environment for the development of more robust and reliable generative models. It's our hope that these take-home messages will serve as a foundation for future research and applications in the realm of deep generative modeling

    Graphon Estimation in bipartite graphs with observable edge labels and unobservable node labels

    Full text link
    Many real-world data sets can be presented in the form of a matrix whose entries correspond to the interaction between two entities of different natures (number of times a web user visits a web page, a student's grade in a subject, a patient's rating of a doctor, etc.). We assume in this paper that the mentioned interaction is determined by unobservable latent variables describing each entity. Our objective is to estimate the conditional expectation of the data matrix given the unobservable variables. This is presented as a problem of estimation of a bivariate function referred to as graphon. We study the cases of piecewise constant and H\"older-continuous graphons. We establish finite sample risk bounds for the least squares estimator and the exponentially weighted aggregate. These bounds highlight the dependence of the estimation error on the size of the data set, the maximum intensity of the interactions, and the level of noise. As the analyzed least-squares estimator is intractable, we propose an adaptation of Lloyd's alternating minimization algorithm to compute an approximation of the least-squares estimator. Finally, we present numerical experiments in order to illustrate the empirical performance of the graphon estimator on synthetic data sets

    Nonparametric Two-Sample Test for Networks Using Joint Graphon Estimation

    Full text link
    This paper focuses on the comparison of networks on the basis of statistical inference. For that purpose, we rely on smooth graphon models as a nonparametric modeling strategy that is able to capture complex structural patterns. The graphon itself can be viewed more broadly as density or intensity function on networks, making the model a natural choice for comparison purposes. Extending graphon estimation towards modeling multiple networks simultaneously consequently provides substantial information about the (dis-)similarity between networks. Fitting such a joint model - which can be accomplished by applying an EM-type algorithm - provides a joint graphon estimate plus a corresponding prediction of the node positions for each network. In particular, it entails a generalized network alignment, where nearby nodes play similar structural roles in their respective domains. Given that, we construct a chi-squared test on equivalence of network structures. Simulation studies and real-world examples support the applicability of our network comparison strategy.Comment: 25 pages, 6 figure

    3-я Міжнародна конференція зі сталого майбутнього: екологічні, технологічні, соціальні та економічні аспекти (ICSF 2022) 24-27 травня 2022 року, м. Кривий Ріг, Україна

    Get PDF
    Матеріали 3-ої Міжнародної конференції зі сталого майбутнього: екологічні, технологічні, соціальні та економічні аспекти (ICSF 2022) 24-27 травня 2022 року, м. Кривий Ріг, Україна.Proceedings of the 3rd International Conference on Sustainable Futures: Environmental, Technological, Social and Economic Matters (ICSF 2022) 24-27 May 2022, Kryvyi Rih, Ukraine

    Higher-order accurate two-sample network inference and network hashing

    Full text link
    Two-sample hypothesis testing for comparing two networks is an important yet difficult problem. Major challenges include: potentially different sizes and sparsity levels; non-repeated observations of adjacency matrices; computational scalability; and theoretical investigations, especially on finite-sample accuracy and minimax optimality. In this article, we propose the first provably higher-order accurate two-sample inference method by comparing network moments. Our method extends the classical two-sample t-test to the network setting. We make weak modeling assumptions and can effectively handle networks of different sizes and sparsity levels. We establish strong finite-sample theoretical guarantees, including rate-optimality properties. Our method is easy to implement and computes fast. We also devise a novel nonparametric framework of offline hashing and fast querying particularly effective for maintaining and querying very large network databases. We demonstrate the effectiveness of our method by comprehensive simulations. We apply our method to two real-world data sets and discover interesting novel structures

    Computation in Complex Networks

    Get PDF
    Complex networks are one of the most challenging research focuses of disciplines, including physics, mathematics, biology, medicine, engineering, and computer science, among others. The interest in complex networks is increasingly growing, due to their ability to model several daily life systems, such as technology networks, the Internet, and communication, chemical, neural, social, political and financial networks. The Special Issue “Computation in Complex Networks" of Entropy offers a multidisciplinary view on how some complex systems behave, providing a collection of original and high-quality papers within the research fields of: • Community detection • Complex network modelling • Complex network analysis • Node classification • Information spreading and control • Network robustness • Social networks • Network medicin

    Joint Network Topology Inference via a Shared Graphon Model

    Full text link
    We consider the problem of estimating the topology of multiple networks from nodal observations, where these networks are assumed to be drawn from the same (unknown) random graph model. We adopt a graphon as our random graph model, which is a nonparametric model from which graphs of potentially different sizes can be drawn. The versatility of graphons allows us to tackle the joint inference problem even for the cases where the graphs to be recovered contain different number of nodes and lack precise alignment across the graphs. Our solution is based on combining a maximum likelihood penalty with graphon estimation schemes and can be used to augment existing network inference methods. The proposed joint network and graphon estimation is further enhanced with the introduction of a robust method for noisy graph sampling information. We validate our proposed approach by comparing its performance against competing methods in synthetic and real-world datasets.Comment: 15 pages, 9 figures. arXiv admin note: text overlap with arXiv:2202.0568

    Large-Scale Structure of Multi-Optimised Networks

    Get PDF
    corecore