119,319 research outputs found
Identifying Graphs from Noisy Observational Data
There is a growing amount of data describing networks -- examples include social networks, communication networks, and biological networks. As the amount of available data increases, so does our interest in analyzing the properties and characteristics of these networks. However, in most cases the data is noisy, incomplete, and the result of passively acquired observational data; naively analyzing these networks without taking these errors into account can result in inaccurate and misleading conclusions. In my dissertation, I study the tasks of entity resolution, link prediction, and collective classification to address these deficiencies. I describe these tasks in detail and discuss my own work on each of these tasks. For entity resolution, I develop a method for resolving the identities of name mentions in email communications. For link prediction, I develop a method for inferring subordinate-manager relationships between individuals in an email communication network. For collective classification, I propose an adaptive active surveying method to address node labeling in a query-driven setting on network data. In many real-world settings, however, these deficiencies are not found in isolation and all need to be addressed to infer the desired complete and accurate network. Furthermore, because of the dependencies typically found in these tasks, the tasks are inherently inter-related and must be performed jointly. I define the general problem of graph identification which simultaneously performs these tasks; removing the noise and missing values in the observed input network and inferring the complete and accurate output network. I present a novel approach to graph identification using a collection of Coupled Collective Classifiers, C3, which, in addition to capturing the variety of features typically used for each task, can capture the intra- and inter-dependencies required to correctly infer nodes, edges, and labels in the output network. I discuss variants of C3 using different learning and inference paradigms and show the superior performance of C3, in terms of both prediction quality and runtime performance, over various previous approaches. I then conclude by presenting the Graph Alignment, Identification, and Analysis (GAIA) open-source software library which not only provides an implementation of C3 but also algorithms for various tasks in network data such as entity resolution, link prediction, collective classification, clustering, active learning, data generation, and analysis
MPA network design based on graph network theory and emergent properties of larval dispersal
Despite the recognised effectiveness of networks of Marine Protected Areas
(MPAs) as a biodiversity conservation instrument, nowadays MPA network design
frequently disregards the importance of connectivity patterns. In the case of
sedentary marine populations, connectivity stems not only from the stochastic
nature of the physical environment that affects early-life stages dispersal,
but also from the spawning stock attributes that affect the reproductive output
(e.g., passive eggs and larvae) and its survivorship. Early-life stages are
virtually impossible to track in the ocean. Therefore, numerical ocean current
simulations coupled to egg and larval Lagrangian transport models remain the
most common approach for the assessment of marine larval connectivity. Inferred
larval connectivity may be different depending on the type of connectivity
considered; consequently, the prioritisation of sites for marine populations'
conservation might also differ. Here, we introduce a framework for evaluating
and designing MPA networks based on the identification of connectivity hotspots
using graph theoretic analysis. We use as a case of study a network of
open-access areas and MPAs, off Mallorca Island (Spain), and test its
effectiveness for the protection of the painted comber Serranus scriba. Outputs
from network analysis are used to: (1) identify critical areas for improving
overall larval connectivity; (2) assess the impact of species' biological
parameters in network connectivity; and (3) explore alternative MPA
configurations to improve average network connectivity. Results demonstrate the
potential of graph theory to identify non-trivial egg/larval dispersal patterns
and emerging collective properties of the MPA network which are relevant for
increasing protection efficiency.Comment: 8 figures, 3 tables, 1 Supplementary material (including 4 table; 3
figures and supplementary methods
Exact dimension estimation of interacting qubit systems assisted by a single quantum probe
Estimating the dimension of an Hilbert space is an important component of
quantum system identification. In quantum technologies, the dimension of a
quantum system (or its corresponding accessible Hilbert space) is an important
resource, as larger dimensions determine e.g. the performance of quantum
computation protocols or the sensitivity of quantum sensors. Despite being a
critical task in quantum system identification, estimating the Hilbert space
dimension is experimentally challenging. While there have been proposals for
various dimension witnesses capable of putting a lower bound on the dimension
from measuring collective observables that encode correlations, in many
practical scenarios, especially for multiqubit systems, the experimental
control might not be able to engineer the required initialization, dynamics and
observables.
Here we propose a more practical strategy, that relies not on directly
measuring an unknown multiqubit target system, but on the indirect interaction
with a local quantum probe under the experimenter's control. Assuming only that
the interaction model is given and the evolution correlates all the qubits with
the probe, we combine a graph-theoretical approach and realization theory to
demonstrate that the dimension of the Hilbert space can be exactly estimated
from the model order of the system. We further analyze the robustness in the
presence of background noise of the proposed estimation method based on
realization theory, finding that despite stringent constrains on the allowed
noise level, exact dimension estimation can still be achieved.Comment: v3: accepted version. We would like to offer our gratitudes to the
editors and referees for their helpful and insightful opinions and feedback
Spectral identification of networks using sparse measurements
We propose a new method to recover global information about a network of
interconnected dynamical systems based on observations made at a small number
(possibly one) of its nodes. In contrast to classical identification of full
graph topology, we focus on the identification of the spectral graph-theoretic
properties of the network, a framework that we call spectral network
identification.
The main theoretical results connect the spectral properties of the network
to the spectral properties of the dynamics, which are well-defined in the
context of the so-called Koopman operator and can be extracted from data
through the Dynamic Mode Decomposition algorithm. These results are obtained
for networks of diffusively-coupled units that admit a stable equilibrium
state. For large networks, a statistical approach is considered, which focuses
on spectral moments of the network and is well-suited to the case of
heterogeneous populations.
Our framework provides efficient numerical methods to infer global
information on the network from sparse local measurements at a few nodes.
Numerical simulations show for instance the possibility of detecting the mean
number of connections or the addition of a new vertex using measurements made
at one single node, that need not be representative of the other nodes'
properties.Comment: 3
Uncovering collective listening habits and music genres in bipartite networks
In this paper, we analyze web-downloaded data on people sharing their music
library, that we use as their individual musical signatures (IMS). The system
is represented by a bipartite network, nodes being the music groups and the
listeners. Music groups audience size behaves like a power law, but the
individual music library size is an exponential with deviations at small
values. In order to extract structures from the network, we focus on
correlation matrices, that we filter by removing the least correlated links.
This percolation idea-based method reveals the emergence of social communities
and music genres, that are visualised by a branching representation. Evidence
of collective listening habits that do not fit the neat usual genres defined by
the music industry indicates an alternative way of classifying listeners/music
groups. The structure of the network is also studied by a more refined method,
based upon a random walk exploration of its properties. Finally, a personal
identification - community imitation model (PICI) for growing bipartite
networks is outlined, following Potts ingredients. Simulation results do
reproduce quite well the empirical data.Comment: submitted to PR
Consensus, Cohesion and Connectivity
Social life clusters into groups held together by ties that also transmit
information. When collective problems occur, group members use their ties to
discuss what to do and to establish an agreement, to be reached quick enough to
prevent discounting the value of the group decision. The speed at which a group
reaches consensus can be predicted by the algebraic connectivity of the
network, which also imposes a lower bound on the group's cohesion. This
specific measure of connectivity is put to the test by re-using experimental
data, which confirm the prediction
Theories for influencer identification in complex networks
In social and biological systems, the structural heterogeneity of interaction
networks gives rise to the emergence of a small set of influential nodes, or
influencers, in a series of dynamical processes. Although much smaller than the
entire network, these influencers were observed to be able to shape the
collective dynamics of large populations in different contexts. As such, the
successful identification of influencers should have profound implications in
various real-world spreading dynamics such as viral marketing, epidemic
outbreaks and cascading failure. In this chapter, we first summarize the
centrality-based approach in finding single influencers in complex networks,
and then discuss the more complicated problem of locating multiple influencers
from a collective point of view. Progress rooted in collective influence
theory, belief-propagation and computer science will be presented. Finally, we
present some applications of influencer identification in diverse real-world
systems, including online social platforms, scientific publication, brain
networks and socioeconomic systems.Comment: 24 pages, 6 figure
Statistical interaction modeling of bovine herd behaviors
While there has been interest in modeling the group behavior of herds or flocks, much of this work has focused on simulating their collective spatial motion patterns which have not accounted for individuality in the herd and instead assume a homogenized role for all members or sub-groups of the herd. Animal behavior experts have noted that domestic animals exhibit behaviors that are indicative of social hierarchy: leader/follower type behaviors are present as well as dominance and subordination, aggression and rank order, and specific social affiliations may also exist. Both wild and domestic cattle are social species, and group behaviors are likely to be influenced by the expression of specific social interactions. In this paper, Global Positioning System coordinate fixes gathered from a herd of beef cows tracked in open fields over several days at a time are utilized to learn a model that focuses on the interactions within the herd as well as its overall movement. Using these data in this way explores the validity of existing group behavior models against actual herding behaviors. Domain knowledge, location geography and human observations, are utilized to explain the causes of these deviations from this idealized behavior
- …