20,987 research outputs found

    Inferring Social Status and Rich Club Effects in Enterprise Communication Networks

    Full text link
    Social status, defined as the relative rank or position that an individual holds in a social hierarchy, is known to be among the most important motivating forces in social behaviors. In this paper, we consider the notion of status from the perspective of a position or title held by a person in an enterprise. We study the intersection of social status and social networks in an enterprise. We study whether enterprise communication logs can help reveal how social interactions and individual status manifest themselves in social networks. To that end, we use two enterprise datasets with three communication channels --- voice call, short message, and email --- to demonstrate the social-behavioral differences among individuals with different status. We have several interesting findings and based on these findings we also develop a model to predict social status. On the individual level, high-status individuals are more likely to be spanned as structural holes by linking to people in parts of the enterprise networks that are otherwise not well connected to one another. On the community level, the principle of homophily, social balance and clique theory generally indicate a "rich club" maintained by high-status individuals, in the sense that this community is much more connected, balanced and dense. Our model can predict social status of individuals with 93% accuracy.Comment: 13 pages, 4 figure

    Identifying Graphs from Noisy Observational Data

    Get PDF
    There is a growing amount of data describing networks -- examples include social networks, communication networks, and biological networks. As the amount of available data increases, so does our interest in analyzing the properties and characteristics of these networks. However, in most cases the data is noisy, incomplete, and the result of passively acquired observational data; naively analyzing these networks without taking these errors into account can result in inaccurate and misleading conclusions. In my dissertation, I study the tasks of entity resolution, link prediction, and collective classification to address these deficiencies. I describe these tasks in detail and discuss my own work on each of these tasks. For entity resolution, I develop a method for resolving the identities of name mentions in email communications. For link prediction, I develop a method for inferring subordinate-manager relationships between individuals in an email communication network. For collective classification, I propose an adaptive active surveying method to address node labeling in a query-driven setting on network data. In many real-world settings, however, these deficiencies are not found in isolation and all need to be addressed to infer the desired complete and accurate network. Furthermore, because of the dependencies typically found in these tasks, the tasks are inherently inter-related and must be performed jointly. I define the general problem of graph identification which simultaneously performs these tasks; removing the noise and missing values in the observed input network and inferring the complete and accurate output network. I present a novel approach to graph identification using a collection of Coupled Collective Classifiers, C3, which, in addition to capturing the variety of features typically used for each task, can capture the intra- and inter-dependencies required to correctly infer nodes, edges, and labels in the output network. I discuss variants of C3 using different learning and inference paradigms and show the superior performance of C3, in terms of both prediction quality and runtime performance, over various previous approaches. I then conclude by presenting the Graph Alignment, Identification, and Analysis (GAIA) open-source software library which not only provides an implementation of C3 but also algorithms for various tasks in network data such as entity resolution, link prediction, collective classification, clustering, active learning, data generation, and analysis

    Characterizing and Detecting Unrevealed Elements of Network Systems

    Get PDF
    This dissertation addresses the problem of discovering and characterizing unknown elements in network systems. Klir (1985) provides a general definition of a system as “... a set of some things and a relation among the things (p. 4). A system, where the `things\u27, i.e. nodes, are related through links is a network system (Klir, 1985). The nodes can represent a range of entities such as machines or people (Pearl, 2001; Wasserman & Faust, 1994). Likewise, links can represent abstract relationships such as causal influence or more visible ties such as roads (Pearl, 1988, pp. 50-51; Wasserman & Faust, 1994; Winston, 1994, p. 394). It is not uncommon to have incomplete knowledge of network systems due to either passive circumstances, e.g. limited resources to observe a network, active circumstances, e.g. intentional acts of concealment, or some combination of active and passive influences (McCormick & Owen, 2000, p. 175; National Research Council, 2005, pp. 7, 11). This research provides statistical and graph theoretic approaches for such situations, including those in which nodes are causally related (Geiger & Pearl, 1990, pp. 3, 10; Glymour, Scheines, Spirtes, & Kelly, 1987, pp. 75-86, 178183; Murphy, 1998; Verma & Pearl, 1991, pp. 257, 260, 264-265). A related aspect of this research is accuracy assessment. It is possible an analyst could fail to detect a network element, or be aware of network elements, but incorrectly conclude the associated network system structure (Borgatti, Carley, & Krackhardt, 2006). The possibilities require assessment of the accuracy of the observed and conjectured network systems, and this research provides a means to do so (Cavallo & Klir, 1979, p. 143; Kelly, 1957, p. 968)

    Reconstructing propagation networks with natural diversity and identifying hidden sources

    Get PDF
    Our ability to uncover complex network structure and dynamics from data is fundamental to understanding and controlling collective dynamics in complex systems. Despite recent progress in this area, reconstructing networks with stochastic dynamical processes from limited time series remains to be an outstanding problem. Here we develop a framework based on compressed sensing to reconstruct complex networks on which stochastic spreading dynamics take place. We apply the methodology to a large number of model and real networks, finding that a full reconstruction of inhomogeneous interactions can be achieved from small amounts of polarized (binary) data, a virtue of compressed sensing. Further, we demonstrate that a hidden source that triggers the spreading process but is externally inaccessible can be ascertained and located with high confidence in the absence of direct routes of propagation from it. Our approach thus establishes a paradigm for tracing and controlling epidemic invasion and information diffusion in complex networked systems.Comment: 20 pages and 5 figures. For Supplementary information, please see http://www.nature.com/ncomms/2014/140711/ncomms5323/full/ncomms5323.html#

    Detecting hierarchical relationships and roles from online interaction networks

    Get PDF
    In social networks, analysing the explicit interactions among users can help in inferring hierarchical relationships and roles that may be implicit. In this thesis, we focus on two objectives: detecting hierarchical relationships between users and inferring the hierarchical roles of users interacting via the same online communication medium. In both cases, we show that considering the temporal dimension of interaction substantially improves the detection of relationships and roles. The first focus of this thesis is on the problem of inferring implicit relationships from interactions between users. Based on promising results obtained by standard link-analysis methods such as PageRank and Rooted-PageRank (RPR), we introduce three novel time-based approaches, \Time-F" based on a defined time function, Filter and Refine (FiRe) which is a hybrid approach based on RPR and Time-F, and Time-sensitive Rooted-PageRank (T-RPR) which applies RPR in a way that takes into account the time-dimension of interactions in the process of detecting hierarchical ties. We experiment on two datasets, the Enron email dataset to infer managersubordinate relationships from email exchanges, and a scientific publication coauthorship dataset to detect PhD advisor-advisee relationships from paper co-authorships. Our experiments demonstrate that time-based methods perform better in terms of recall. In particular T-RPR turns out to be superior over most recent competitor methods as well as all other approaches we propose. The second focus of this thesis is examining the online communication behaviour of users working on the same activity in order to identify the different hierarchical roles played by the users. We propose two approaches. In the first approach, supervised learning is used to train different classification algorithms. In the second approach, we address the problem as a sequence classification problem. A novel sequence classification framework is defined that generates time-dependent features based on frequent patterns at multiple levels of time granularity. Our framework is a exible technique for sequence classification to be applied in different domains. We experiment on an educational dataset collected from an asynchronous communication tool used by students to accomplish an underlying group project. Our experimental findings show that the first supervised approach achieves the best mapping of students to their roles when the individual attributes of the students, information about the reply relationships among them as well as quantitative time-based features are considered. Similarly, our multi-granularity pattern-based framework shows competitive performance in detecting the students' roles. Both approaches are significantly better than the baselines considered

    CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods

    Full text link
    Causal relationships are commonly examined in manufacturing processes to support faults investigations, perform interventions, and make strategic decisions. Industry 4.0 has made available an increasing amount of data that enable data-driven Causal Discovery (CD). Considering the growing number of recently proposed CD methods, it is necessary to introduce strict benchmarking procedures on publicly available datasets since they represent the foundation for a fair comparison and validation of different methods. This work introduces two novel public datasets for CD in continuous manufacturing processes. The first dataset employs the well-known Tennessee Eastman simulator for fault detection and process control. The second dataset is extracted from an ultra-processed food manufacturing plant, and it includes a description of the plant, as well as multiple ground truths. These datasets are used to propose a benchmarking procedure based on different metrics and evaluated on a wide selection of CD algorithms. This work allows testing CD methods in realistic conditions enabling the selection of the most suitable method for specific target applications. The datasets are available at the following link: https://github.com/giovanniMenComment: Supplementary Materials at: https://github.com/giovanniMen/CPCaD-Benc
    • …
    corecore