15 research outputs found

    Prediction, evolution and privacy in social and affiliation networks

    Get PDF
    In the last few years, there has been a growing interest in studying online social and affiliation networks, leading to a new category of inference problems that consider the actor characteristics and their social environments. These problems have a variety of applications, from creating more effective marketing campaigns to designing better personalized services. Predictive statistical models allow learning hidden information automatically in these networks but also bring many privacy concerns. Three of the main challenges that I address in my thesis are understanding 1) how the complex observed and unobserved relationships among actors can help in building better behavior models, and in designing more accurate predictive algorithms, 2) what are the processes that drive the network growth and link formation, and 3) what are the implications of predictive algorithms to the privacy of users who share content online. The majority of previous work in prediction, evolution and privacy in online social networks has concentrated on the single-mode networks which form around user-user links, such as friendship and email communication. However, single-mode networks often co-exist with two-mode affiliation networks in which users are linked to other entities, such as social groups, online content and events. We study the interplay between these two types of networks and show that analyzing these higher-order interactions can reveal dependencies that are difficult to extract from the pair-wise interactions alone. In particular, we present our contributions to the challenging problems of collective classification, link prediction, network evolution, anonymization and preserving privacy in social and affiliation networks. We evaluate our models on real-world data sets from well-known online social networks, such as Flickr, Facebook, Dogster and LiveJournal

    Visual network storytelling

    Get PDF
    We love networks! Networks are powerful conceptual tools, encapsulating in a single item multiple affordances for computation (networks as graphs), visualization (networks as maps) and manipulation of data (networks as interfaces). In the field of mathematics, graph theory has been around since Euler’s walk on Königsberg’s bridges (Euler 1736). But it is not until the end of the last century that networks acquired a multidisciplinary popularity. Graph computation is certainly powerful, but it is also very demanding and for many years its advantages remained the privilege of scholars with solid mathematical fundamentals. In the last few decades, however, networks acquired a new set of affordances and reached a larger audience, thanks to the growing availability of tools to design them. Drawn on paper or screen, networks became easier to handle and obtained properties that calculation could not express. Far from being merely aesthetic, the graphical representation of networks has an intrinsic hermeneutic value. Networks can become maps and be read as such. Combining the computation power of graphs with the visual expressivity of maps and the interactivity of computer interface, networks can be used in Exploratory Data Analysis (Tukey, 1977). Navigating through data becomes so fluid that zooming in on a single data-point and out to a landscape of a million traces is just a click away. Increasingly specialized software has been designed to support the exploration of network data. Tools like Pajek (vlado.fmf.uni-lj.si/pub/networks/pajek), NetDraw (sites.google.com/site/ netdrawsoftware), Ucinet (www.analytictech.com/ucinet), Guess (graphexploration.cond.org) and more recently Gephi (gephi.org) have progressively smoothed out the difficulties of graph mathematics, turning a complex mathematical formalism into a more user-friendly point-and-click interface (1) . If visual exploration of networks can output to confirmatory statistics, what about sharing one network exploration with others? We developed Manylines (https://github.com/medialab/manylines), a tool allowing you to share the visual analysis of a network with a wide audience by publishing it on the web. With Manylines, you can not only easily publish a network on the web but also share its exploration by describing the network’s visual key findings. Through a set of examples, we will illustrate how the narrative opportunities of Manylines can contribute to the enunciation of a visual grammar of networks. (1) A simple look at the URLs of the subsequent tools reveals the efforts deployed to make network-manipulation tools user-friendly and thereby available to a larger public

    Exploring the Organizational Effects of Directors\u27 Embeddedness in Board Networks

    Get PDF
    In this dissertation, I explore how top executives’ and directors’ embeddedness in corporate elite networks within and between organizations’ boards of directors influence organizational strategy and policy. In the first study, I conduct a comprehensive review of the governance literature using both a traditional narrative approach as well as a bibliometric main path analysis, which traces the development and diffusion of scholarly knowledge on corporate elite networks. In the second study, drawing from network theory and behavioral governance research, I introduce a methodology that allows researchers to model intraboard networks by measuring the strength of ties among members of boards of directors based on objective formative indicators of the constructs of social similarity, social status, social exchange, and social history. Next, I use this technique to explore the antecedents and consequences of intraorganizational network characteristics of boards. Finally, in the third study, I examine the joint influence of interlocking directorates and intraorganizational networks of boards of directors on interorganizational imitation of corporate strategic activity. Results show that directors’ centrality within a focal organization’s board and those of its alters are important predictors of interorganizational imitation of corporate strategic activity. I contribute to the strategic management and organization theory literatures by advancing our understanding of the relationship of corporate elite networks with organizational strategy and policy, and by introducing a new approach to modeling directors’ networks in corporate governance research

    User Information Modelling in Social Communities and Networks

    Get PDF
    User modelling is the basis for social network analysis, such as community detection, expert finding, etc. The aim of this research is to model user information including user-generated content and social ties. There have been many algorithms for community detection. However, the existing algorithms consider little about the rich hidden knowledge within communities of social networks. In this research, we propose to simultaneously discover communities and the hidden/latent knowledge within them. We focus on jointly modelling communities, user sentiment topics, and the social links. We also learn to recommend experts to the askers based on the newly posted questions in online question answering communities. Specifically, we first propose a new probabilistic model to depict users' expertise based on answers and their descriptive ability based on questions. To exploit social information in community question answering (CQA), the link analysis is also considered. We also propose a user expertise model under tags rather than the general topics. In CQA sites, it is very common that some users share the same user names. Once an ambiguous user name is recommended, it is difficult for the asker to find out the target user directly from the large scale CQA site. We propose a simple but effective method to disambiguate user names by ranking their tag-based relevance to a query question. We evaluate the proposed models and methods on real world datasets. For community discovery, our models can not only identify communities with different topic-sentiment distributions, but also achieve comparable performance. With respect to the expert recommendation in CQA, the unified modelling of user topics/tags and abilities are capable of improving the recommendation performance. Moreover, as for the user name disambiguation in CQA, the proposed method can help question askers match the ambiguous user names with the right people with high accuracy

    Multidimensional Network analysis

    Get PDF
    This thesis is focused on the study of multidimensional networks. A multidimensional network is a network in which among the nodes there may be multiple different qualitative and quantitative relations. Traditionally, complex network analysis has focused on networks with only one kind of relation. Even with this constraint, monodimensional networks posed many analytic challenges, being representations of ubiquitous complex systems in nature. However, it is a matter of common experience that the constraint of considering only one single relation at a time limits the set of real world phenomena that can be represented with complex networks. When multiple different relations act at the same time, traditional complex network analysis cannot provide suitable analytic tools. To provide the suitable tools for this scenario is exactly the aim of this thesis: the creation and study of a Multidimensional Network Analysis, to extend the toolbox of complex network analysis and grasp the complexity of real world phenomena. The urgency and need for a multidimensional network analysis is here presented, along with an empirical proof of the ubiquity of this multifaceted reality in different complex networks, and some related works that in the last two years were proposed in this novel setting, yet to be systematically defined. Then, we tackle the foundations of the multidimensional setting at different levels, both by looking at the basic extensions of the known model and by developing novel algorithms and frameworks for well-understood and useful problems, such as community discovery (our main case study), temporal analysis, link prediction and more. We conclude this thesis with two real world scenarios: a monodimensional study of international trade, that may be improved with our proposed multidimensional analysis; and the analysis of literature and bibliography in the field of classical archaeology, used to show how natural and useful the choice of a multidimensional network analysis strategy is in a problem traditionally tackled with different techniques

    Statistical analysis of networks with community structure and bootstrap methods for big data

    Get PDF
    This dissertation is divided into two parts, concerning two areas of statistical methodology. The first part of this dissertation concerns statistical analysis of networks with community structure. The second part of this dissertation concerns bootstrap methods for big data. Statistical analysis of networks with community structure: Networks are ubiquitous in today's world --- network data appears from varied fields such as scientific studies, sociology, technology, social media and the Internet, to name a few. An interesting aspect of many real-world networks is the presence of community structure and the problem of detecting this community structure. In the first chapter, we consider heterogeneous networks which seems to have not been considered in the statistical community detection literature. We propose a blockmodel for heterogeneous networks with community structure, and introduce a heterogeneous spectral clustering algorithm for community detection in heterogeneous networks. Theoretical properties of the clustering algorithm under the proposed model are studied, along with simulation study and data analysis. A network feature that is closely associated with community structure is the popularity of nodes in different communities. Neither the classical stochastic blockmodel nor its degree-corrected extension can satisfactorily capture the dynamics of node popularity. In the second chapter, we propose a popularity-adjusted blockmodel for flexible modeling of node popularity. We establish consistency of likelihood modularity for community detection under the proposed model, and illustrate the improved empirical insights that can be gained through this methodology by analyzing the political blogs network and the British MP network, as well as in simulation studies. Bootstrap methods for big data: Resampling methods provide a powerful method of evaluating the precision of a wide variety of statistical inference methods. The complexity and massive size of big data makes it infeasible to apply traditional resampling methods for big data. In the first chapter, we consider the problem of resampling for irregularly spaced dependent data. Traditional block-based resampling or subsampling schemes for stationary data are difficult to implement when the data are irregularly spaced, as it takes careful programming effort to partition the sampling region into complete and incomplete blocks. We develop a resampling method called Dependent Random Weighting (DRW) for irregularly spaced dependent data, where instead of using blocks we use random weights to resample the data. By allowing the random weights to be dependent, the dependency structure of the data can be preserved in the resamples. We study the theoretical properties of this resampling methods as well as its numerical performance in simulations. In the second chapter, we consider the problem of resampling in massive data, where traditional methods like bootstrap (for independent data) or moving block bootstrap (for dependent data) can be computationally infeasible since each resample has effective size of the same order as the sample. We develop a new resampling method called subsampled double bootstrap (SDB) for both independent and stationary data. SDB works by choosing small random subsets of the massive data, and then constructing a single resample from that subset using bootstrap (for independent data) or moving block bootstrap (for stationary data). We study theoretical properties of SDB as well as its numerical performance in simulated data and real data. Extending the underlying ideas of the second chapter, we introduce two new resampling strategies for big data in Chapter 3. The first strategy is called aggregation of little bootstraps or ALB, a generalized resampling technique that includes the SDB as a special case. The second strategy is called subsampled residual bootstrap or SRB, a fast version of residual bootstrap intended for massive regression models. We study both methods through simulations

    Collective attention in online social networks

    Get PDF
    Social media is an ever-present tool in modern society, and its widespread usage positions it as a valuable source of insights into society at large. The study of collective attention in particular is one application that benefits from the scale of social media data. In this thesis we will investigate how collective attention manifests on social media and how it can be understood. We approach this challenge from several perspectives across network and data science. We first focus on a period of increased media attention to climate change to see how robust the previously observed polarised structures are under a collective attention event. Our experiments will show that while the level of engagement with the climate change debate increases, there is little disruption to the existing polarised structure in the communication network. Understanding the climate media debate requires addressing a methodological concern about the most effective method for weighting bipartite network projections with respect to the accuracy of community detection. We test seven weighting schemes on constructed networks with known community structure and then use the preferred methodology we identify to study collective attention in the climate change debate on Twitter. Following on from this, we will investigate how collective attention changes over the course of a single event over a longer period, namely the COVID-19 pandemic. We measure how the disruption to in-person social interactions as a consequence of attempts to limit the spread of COVID-19 in England and Wales have affected social interaction patterns as they appear on Twitter. Using a dataset of tweets with location tags, we will see how the spatial attention to locations and collective attention to discussion topics are affected by social distancing and population movement restrictions in different stages of the pandemic. Finally we present a new analysis framework for collective attention events that allows direct comparisons across different time and volume scales, such as those seen in the climate change and COVID-19 experiments. We demonstrate that this approach performs better than traditional approaches that rely on binning the timeseries at certain resolutions and comment on the mechanistic properties highlighted by our new methodology.Engineering and Physical Sciences Research Council (EPSRC

    On Business Analytics: Dynamic Network Analysis for Descriptive Analytics and Multicriteria Decision Analysis for Prescriptive Analytics.

    Get PDF
    Ferry Jules. Collèges communaux. — Classement des professeurs. In: Bulletin administratif de l'instruction publique. Tome 24 n°467, 1881. pp. 836-842
    corecore