40 research outputs found
Comparing Community Structure to Characteristics in Online Collegiate Social Networks
We study the structure of social networks of students by examining the graphs
of Facebook "friendships" at five American universities at a single point in
time. We investigate each single-institution network's community structure and
employ graphical and quantitative tools, including standardized pair-counting
methods, to measure the correlations between the network communities and a set
of self-identified user characteristics (residence, class year, major, and high
school). We review the basic properties and statistics of the pair-counting
indices employed and recall, in simplified notation, a useful analytical
formula for the z-score of the Rand coefficient. Our study illustrates how to
examine different instances of social networks constructed in similar
environments, emphasizes the array of social forces that combine to form
"communities," and leads to comparative observations about online social lives
that can be used to infer comparisons about offline social structures. In our
illustration of this methodology, we calculate the relative contributions of
different characteristics to the community structure of individual universities
and subsequently compare these relative contributions at different
universities, measuring for example the importance of common high school
affiliation to large state universities and the varying degrees of influence
common major can have on the social structure at different universities. The
heterogeneity of communities that we observe indicates that these networks
typically have multiple organizing factors rather than a single dominant one.Comment: Version 3 (17 pages, 5 multi-part figures), accepted in SIAM Revie
Community Structure in Congressional Cosponsorship Networks
We study the United States Congress by constructing networks between Members
of Congress based on the legislation that they cosponsor. Using the concept of
modularity, we identify the community structure of Congressmen, as connected
via sponsorship/cosponsorship of the same legislation, to investigate the
collaborative communities of legislators in both chambers of Congress. This
analysis yields an explicit and conceptually clear measure of political
polarization, demonstrating a sharp increase in partisan polarization which
preceded and then culminated in the 104th Congress (1995-1996), when
Republicans took control of both chambers. Although polarization has since
waned in the U.S. Senate, it remains at historically high levels in the House
of Representatives.Comment: 8 pages, 4 figures (some with multiple parts), to appear in Physica
A; additional background info and explanations added from last versio
A network-specific approach to percolation in networks with bidirectional links
Methods for determining the percolation threshold usually study the behavior
of network ensembles and are often restricted to a particular type of
probabilistic node/link removal strategy. We propose a network-specific method
to determine the connectivity of nodes below the percolation threshold and
offer an estimate to the percolation threshold in networks with bidirectional
links. Our analysis does not require the assumption that a network belongs to a
specific ensemble and can at the same time easily handle arbitrary removal
strategies (previously an open problem for undirected networks). In validating
our analysis, we find that it predicts the effects of many known complex
structures (e.g., degree correlations) and may be used to study both
probabilistic and deterministic attacks.Comment: 6 pages, 8 figure
Influence of wiring cost on the large-scale architecture of human cortical connectivity
In the past two decades some fundamental properties of cortical connectivity have been discovered: small-world structure, pronounced hierarchical and modular organisation, and strong core and rich-club structures. A common assumption when interpreting results of this kind is that the observed structural properties are present to enable the brain's function. However, the brain is also embedded into the limited space of the skull and its wiring has associated developmental and metabolic costs. These basic physical and economic aspects place separate, often conflicting, constraints on the brain's connectivity, which must be characterized in order to understand the true relationship between brain structure and function. To address this challenge, here we ask which, and to what extent, aspects of the structural organisation of the brain are conserved if we preserve specific spatial and topological properties of the brain but otherwise randomise its connectivity. We perform a comparative analysis of a connectivity map of the cortical connectome both on high- and low-resolutions utilising three different types of surrogate networks: spatially unconstrained (‘random’), connection length preserving (‘spatial’), and connection length optimised (‘reduced’) surrogates. We find that unconstrained randomisation markedly diminishes all investigated architectural properties of cortical connectivity. By contrast, spatial and reduced surrogates largely preserve most properties and, interestingly, often more so in the reduced surrogates. Specifically, our results suggest that the cortical network is less tightly integrated than its spatial constraints would allow, but more strongly segregated than its spatial constraints would necessitate. We additionally find that hierarchical organisation and rich-club structure of the cortical connectivity are largely preserved in spatial and reduced surrogates and hence may be partially attributable to cortical wiring constraints. In contrast, the high modularity and strong s-core of the high-resolution cortical network are significantly stronger than in the surrogates, underlining their potential functional relevance in the brain
Dynamics and Control of Diseases in Networks with Community Structure
The dynamics of infectious diseases spread via direct person-to-person transmission (such as influenza, smallpox, HIV/AIDS, etc.) depends on the underlying host contact network. Human contact networks exhibit strong community structure. Understanding how such community structure affects epidemics may provide insights for preventing the spread of disease between communities by changing the structure of the contact network through pharmaceutical or non-pharmaceutical interventions. We use empirical and simulated networks to investigate the spread of disease in networks with community structure. We find that community structure has a major impact on disease dynamics, and we show that in networks with strong community structure, immunization interventions targeted at individuals bridging communities are more effective than those simply targeting highly connected individuals. Because the structure of relevant contact networks is generally not known, and vaccine supply is often limited, there is great need for efficient vaccination algorithms that do not require full knowledge of the network. We developed an algorithm that acts only on locally available network information and is able to quickly identify targets for successful immunization intervention. The algorithm generally outperforms existing algorithms when vaccine supply is limited, particularly in networks with strong community structure. Understanding the spread of infectious diseases and designing optimal control strategies is a major goal of public health. Social networks show marked patterns of community structure, and our results, based on empirical and simulated data, demonstrate that community structure strongly affects disease dynamics. These results have implications for the design of control strategies
Unsupervised record matching with noisy and incomplete data
We consider the problem of duplicate detection in noisy and incomplete data: given a large data set in which each record has multiple entries (attributes), detect which distinct records refer to the same real world entity. This task is complicated by noise (such as misspellings) and missing data, which can lead to records being different, despite referring to the same entity. Our method consists of three main steps: creating a similarity score between records, grouping records together into "unique entities", and refining the groups. We compare various methods for creating similarity scores between noisy records, considering different combinations of string matching, term frequency-inverse document frequency methods, and n-gram techniques. In particular, we introduce a vectorized soft term frequency-inverse document frequency method, with an optional refinement step. We also discuss two methods to deal with missing data in computing similarity scores.
We test our method on the Los Angeles Police Department Field Interview Card data set, the Cora Citation Matching data set, and two sets of restaurant review data. The results show that the methods that use words as the basic units are preferable to those that use 3-grams. Moreover, in some (but certainly not all) parameter ranges soft term frequency-inverse document frequency methods can outperform the standard term frequency-inverse document frequency method. The results also confirm that our method for automatically determining the number of groups typically works well in many cases and allows for accurate results in the absence of a priori knowledge of the number of unique entities in the data set