119,319 research outputs found

    Identifying Graphs from Noisy Observational Data

    Get PDF
    There is a growing amount of data describing networks -- examples include social networks, communication networks, and biological networks. As the amount of available data increases, so does our interest in analyzing the properties and characteristics of these networks. However, in most cases the data is noisy, incomplete, and the result of passively acquired observational data; naively analyzing these networks without taking these errors into account can result in inaccurate and misleading conclusions. In my dissertation, I study the tasks of entity resolution, link prediction, and collective classification to address these deficiencies. I describe these tasks in detail and discuss my own work on each of these tasks. For entity resolution, I develop a method for resolving the identities of name mentions in email communications. For link prediction, I develop a method for inferring subordinate-manager relationships between individuals in an email communication network. For collective classification, I propose an adaptive active surveying method to address node labeling in a query-driven setting on network data. In many real-world settings, however, these deficiencies are not found in isolation and all need to be addressed to infer the desired complete and accurate network. Furthermore, because of the dependencies typically found in these tasks, the tasks are inherently inter-related and must be performed jointly. I define the general problem of graph identification which simultaneously performs these tasks; removing the noise and missing values in the observed input network and inferring the complete and accurate output network. I present a novel approach to graph identification using a collection of Coupled Collective Classifiers, C3, which, in addition to capturing the variety of features typically used for each task, can capture the intra- and inter-dependencies required to correctly infer nodes, edges, and labels in the output network. I discuss variants of C3 using different learning and inference paradigms and show the superior performance of C3, in terms of both prediction quality and runtime performance, over various previous approaches. I then conclude by presenting the Graph Alignment, Identification, and Analysis (GAIA) open-source software library which not only provides an implementation of C3 but also algorithms for various tasks in network data such as entity resolution, link prediction, collective classification, clustering, active learning, data generation, and analysis

    MPA network design based on graph network theory and emergent properties of larval dispersal

    Full text link
    Despite the recognised effectiveness of networks of Marine Protected Areas (MPAs) as a biodiversity conservation instrument, nowadays MPA network design frequently disregards the importance of connectivity patterns. In the case of sedentary marine populations, connectivity stems not only from the stochastic nature of the physical environment that affects early-life stages dispersal, but also from the spawning stock attributes that affect the reproductive output (e.g., passive eggs and larvae) and its survivorship. Early-life stages are virtually impossible to track in the ocean. Therefore, numerical ocean current simulations coupled to egg and larval Lagrangian transport models remain the most common approach for the assessment of marine larval connectivity. Inferred larval connectivity may be different depending on the type of connectivity considered; consequently, the prioritisation of sites for marine populations' conservation might also differ. Here, we introduce a framework for evaluating and designing MPA networks based on the identification of connectivity hotspots using graph theoretic analysis. We use as a case of study a network of open-access areas and MPAs, off Mallorca Island (Spain), and test its effectiveness for the protection of the painted comber Serranus scriba. Outputs from network analysis are used to: (1) identify critical areas for improving overall larval connectivity; (2) assess the impact of species' biological parameters in network connectivity; and (3) explore alternative MPA configurations to improve average network connectivity. Results demonstrate the potential of graph theory to identify non-trivial egg/larval dispersal patterns and emerging collective properties of the MPA network which are relevant for increasing protection efficiency.Comment: 8 figures, 3 tables, 1 Supplementary material (including 4 table; 3 figures and supplementary methods

    Exact dimension estimation of interacting qubit systems assisted by a single quantum probe

    Full text link
    Estimating the dimension of an Hilbert space is an important component of quantum system identification. In quantum technologies, the dimension of a quantum system (or its corresponding accessible Hilbert space) is an important resource, as larger dimensions determine e.g. the performance of quantum computation protocols or the sensitivity of quantum sensors. Despite being a critical task in quantum system identification, estimating the Hilbert space dimension is experimentally challenging. While there have been proposals for various dimension witnesses capable of putting a lower bound on the dimension from measuring collective observables that encode correlations, in many practical scenarios, especially for multiqubit systems, the experimental control might not be able to engineer the required initialization, dynamics and observables. Here we propose a more practical strategy, that relies not on directly measuring an unknown multiqubit target system, but on the indirect interaction with a local quantum probe under the experimenter's control. Assuming only that the interaction model is given and the evolution correlates all the qubits with the probe, we combine a graph-theoretical approach and realization theory to demonstrate that the dimension of the Hilbert space can be exactly estimated from the model order of the system. We further analyze the robustness in the presence of background noise of the proposed estimation method based on realization theory, finding that despite stringent constrains on the allowed noise level, exact dimension estimation can still be achieved.Comment: v3: accepted version. We would like to offer our gratitudes to the editors and referees for their helpful and insightful opinions and feedback

    Spectral identification of networks using sparse measurements

    Full text link
    We propose a new method to recover global information about a network of interconnected dynamical systems based on observations made at a small number (possibly one) of its nodes. In contrast to classical identification of full graph topology, we focus on the identification of the spectral graph-theoretic properties of the network, a framework that we call spectral network identification. The main theoretical results connect the spectral properties of the network to the spectral properties of the dynamics, which are well-defined in the context of the so-called Koopman operator and can be extracted from data through the Dynamic Mode Decomposition algorithm. These results are obtained for networks of diffusively-coupled units that admit a stable equilibrium state. For large networks, a statistical approach is considered, which focuses on spectral moments of the network and is well-suited to the case of heterogeneous populations. Our framework provides efficient numerical methods to infer global information on the network from sparse local measurements at a few nodes. Numerical simulations show for instance the possibility of detecting the mean number of connections or the addition of a new vertex using measurements made at one single node, that need not be representative of the other nodes' properties.Comment: 3

    Uncovering collective listening habits and music genres in bipartite networks

    Get PDF
    In this paper, we analyze web-downloaded data on people sharing their music library, that we use as their individual musical signatures (IMS). The system is represented by a bipartite network, nodes being the music groups and the listeners. Music groups audience size behaves like a power law, but the individual music library size is an exponential with deviations at small values. In order to extract structures from the network, we focus on correlation matrices, that we filter by removing the least correlated links. This percolation idea-based method reveals the emergence of social communities and music genres, that are visualised by a branching representation. Evidence of collective listening habits that do not fit the neat usual genres defined by the music industry indicates an alternative way of classifying listeners/music groups. The structure of the network is also studied by a more refined method, based upon a random walk exploration of its properties. Finally, a personal identification - community imitation model (PICI) for growing bipartite networks is outlined, following Potts ingredients. Simulation results do reproduce quite well the empirical data.Comment: submitted to PR

    Consensus, Cohesion and Connectivity

    Full text link
    Social life clusters into groups held together by ties that also transmit information. When collective problems occur, group members use their ties to discuss what to do and to establish an agreement, to be reached quick enough to prevent discounting the value of the group decision. The speed at which a group reaches consensus can be predicted by the algebraic connectivity of the network, which also imposes a lower bound on the group's cohesion. This specific measure of connectivity is put to the test by re-using experimental data, which confirm the prediction

    Theories for influencer identification in complex networks

    Full text link
    In social and biological systems, the structural heterogeneity of interaction networks gives rise to the emergence of a small set of influential nodes, or influencers, in a series of dynamical processes. Although much smaller than the entire network, these influencers were observed to be able to shape the collective dynamics of large populations in different contexts. As such, the successful identification of influencers should have profound implications in various real-world spreading dynamics such as viral marketing, epidemic outbreaks and cascading failure. In this chapter, we first summarize the centrality-based approach in finding single influencers in complex networks, and then discuss the more complicated problem of locating multiple influencers from a collective point of view. Progress rooted in collective influence theory, belief-propagation and computer science will be presented. Finally, we present some applications of influencer identification in diverse real-world systems, including online social platforms, scientific publication, brain networks and socioeconomic systems.Comment: 24 pages, 6 figure

    Statistical interaction modeling of bovine herd behaviors

    Get PDF
    While there has been interest in modeling the group behavior of herds or flocks, much of this work has focused on simulating their collective spatial motion patterns which have not accounted for individuality in the herd and instead assume a homogenized role for all members or sub-groups of the herd. Animal behavior experts have noted that domestic animals exhibit behaviors that are indicative of social hierarchy: leader/follower type behaviors are present as well as dominance and subordination, aggression and rank order, and specific social affiliations may also exist. Both wild and domestic cattle are social species, and group behaviors are likely to be influenced by the expression of specific social interactions. In this paper, Global Positioning System coordinate fixes gathered from a herd of beef cows tracked in open fields over several days at a time are utilized to learn a model that focuses on the interactions within the herd as well as its overall movement. Using these data in this way explores the validity of existing group behavior models against actual herding behaviors. Domain knowledge, location geography and human observations, are utilized to explain the causes of these deviations from this idealized behavior
    corecore