276 research outputs found
Considerations about multistep community detection
The problem and implications of community detection in networks have raised a
huge attention, for its important applications in both natural and social
sciences. A number of algorithms has been developed to solve this problem,
addressing either speed optimization or the quality of the partitions
calculated. In this paper we propose a multi-step procedure bridging the
fastest, but less accurate algorithms (coarse clustering), with the slowest,
most effective ones (refinement). By adopting heuristic ranking of the nodes,
and classifying a fraction of them as `critical', a refinement step can be
restricted to this subset of the network, thus saving computational time.
Preliminary numerical results are discussed, showing improvement of the final
partition.Comment: 12 page
Semi-Supervised Overlapping Community Finding based on Label Propagation with Pairwise Constraints
Algorithms for detecting communities in complex networks are generally
unsupervised, relying solely on the structure of the network. However, these
methods can often fail to uncover meaningful groupings that reflect the
underlying communities in the data, particularly when those structures are
highly overlapping. One way to improve the usefulness of these algorithms is by
incorporating additional background information, which can be used as a source
of constraints to direct the community detection process. In this work, we
explore the potential of semi-supervised strategies to improve algorithms for
finding overlapping communities in networks. Specifically, we propose a new
method, based on label propagation, for finding communities using a limited
number of pairwise constraints. Evaluations on synthetic and real-world
datasets demonstrate the potential of this approach for uncovering meaningful
community structures in cases where each node can potentially belong to more
than one community.Comment: Fix table
On the Hardness of SAT with Community Structure
Recent attempts to explain the effectiveness of Boolean satisfiability (SAT)
solvers based on conflict-driven clause learning (CDCL) on large industrial
benchmarks have focused on the concept of community structure. Specifically,
industrial benchmarks have been empirically found to have good community
structure, and experiments seem to show a correlation between such structure
and the efficiency of CDCL. However, in this paper we establish hardness
results suggesting that community structure is not sufficient to explain the
success of CDCL in practice. First, we formally characterize a property shared
by a wide class of metrics capturing community structure, including
"modularity". Next, we show that the SAT instances with good community
structure according to any metric with this property are still NP-hard.
Finally, we study a class of random instances generated from the
"pseudo-industrial" community attachment model of Gir\'aldez-Cru and Levy. We
prove that, with high probability, instances from this model that have
relatively few communities but are still highly modular require exponentially
long resolution proofs and so are hard for CDCL. We also present experimental
evidence that our result continues to hold for instances with many more
communities. This indicates that actual industrial instances easily solved by
CDCL may have some other relevant structure not captured by the community
attachment model.Comment: 23 pages. Full version of a SAT 2016 pape
Outlier Edge Detection Using Random Graph Generation Models and Applications
Outliers are samples that are generated by different mechanisms from other
normal data samples. Graphs, in particular social network graphs, may contain
nodes and edges that are made by scammers, malicious programs or mistakenly by
normal users. Detecting outlier nodes and edges is important for data mining
and graph analytics. However, previous research in the field has merely focused
on detecting outlier nodes. In this article, we study the properties of edges
and propose outlier edge detection algorithms using two random graph generation
models. We found that the edge-ego-network, which can be defined as the induced
graph that contains two end nodes of an edge, their neighboring nodes and the
edges that link these nodes, contains critical information to detect outlier
edges. We evaluated the proposed algorithms by injecting outlier edges into
some real-world graph data. Experiment results show that the proposed
algorithms can effectively detect outlier edges. In particular, the algorithm
based on the Preferential Attachment Random Graph Generation model consistently
gives good performance regardless of the test graph data. Further more, the
proposed algorithms are not limited in the area of outlier edge detection. We
demonstrate three different applications that benefit from the proposed
algorithms: 1) a preprocessing tool that improves the performance of graph
clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel
noisy data clustering algorithm. These applications show the great potential of
the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape
Female economic dependence and the morality of promiscuity
This article is made available through the Brunel Open Access Publishing Fund. Copyright @ The Author(s) 2014.In environments in which female economic dependence on a male mate is higher, male parental investment is more essential. In such environments, therefore, both sexes should value paternity certainty more and thus object more to promiscuity (because promiscuity undermines paternity certainty). We tested this theory of anti-promiscuity morality in two studies (N = 656 and N = 4,626) using U.S. samples. In both, we examined whether opposition to promiscuity was higher among people who perceived greater female economic dependence in their social network. In Study 2, we also tested whether economic indicators of female economic dependence (e.g., female income, welfare availability) predicted anti-promiscuity morality at the state level. Results from both studies supported the proposed theory. At the individual level, perceived female economic dependence explained significant variance in anti-promiscuity morality, even after controlling for variance explained by age, sex, religiosity, political conservatism, and the anti-promiscuity views of geographical neighbors. At the state level, median female income was strongly negatively related to anti-promiscuity morality and this relationship was fully mediated by perceived female economic dependence. These results were consistent with the view that anti-promiscuity beliefs may function to promote paternity certainty in circumstances where male parental investment is particularly important
Language comparison via network topology
Modeling relations between languages can offer understanding of language
characteristics and uncover similarities and differences between languages.
Automated methods applied to large textual corpora can be seen as opportunities
for novel statistical studies of language development over time, as well as for
improving cross-lingual natural language processing techniques. In this work,
we first propose how to represent textual data as a directed, weighted network
by the text2net algorithm. We next explore how various fast,
network-topological metrics, such as network community structure, can be used
for cross-lingual comparisons. In our experiments, we employ eight different
network topology metrics, and empirically showcase on a parallel corpus, how
the methods can be used for modeling the relations between nine selected
languages. We demonstrate that the proposed method scales to large corpora
consisting of hundreds of thousands of aligned sentences on an of-the-shelf
laptop. We observe that on the one hand properties such as communities, capture
some of the known differences between the languages, while others can be seen
as novel opportunities for linguistic studies
Dynamic Community Detection into Analyzing of Wildfires Events
The study and comprehension of complex systems are crucial intellectual and
scientific challenges of the 21st century. In this scenario, network science
has emerged as a mathematical tool to support the study of such systems.
Examples include environmental processes such as wildfires, which are known for
their considerable impact on human life. However, there is a considerable lack
of studies of wildfire from a network science perspective. Here, employing the
chronological network concept -- a temporal network where nodes are linked if
two consecutive events occur between them -- we investigate the information
that dynamic community structures reveal about the wildfires' dynamics.
Particularly, we explore a two-phase dynamic community detection approach,
i.e., we applied the Louvain algorithm on a series of snapshots. Then we used
the Jaccard similarity coefficient to match communities across adjacent
snapshots. Experiments with the MODIS dataset of fire events in the Amazon
basing were conducted. Our results show that the dynamic communities can reveal
wildfire patterns observed throughout the year.Comment: 16 pages, 8 figure
Community landscapes: an integrative approach to determine overlapping network module hierarchy, identify key nodes and predict network dynamics
Background: Network communities help the functional organization and
evolution of complex networks. However, the development of a method, which is
both fast and accurate, provides modular overlaps and partitions of a
heterogeneous network, has proven to be rather difficult. Methodology/Principal
Findings: Here we introduce the novel concept of ModuLand, an integrative
method family determining overlapping network modules as hills of an influence
function-based, centrality-type community landscape, and including several
widely used modularization methods as special cases. As various adaptations of
the method family, we developed several algorithms, which provide an efficient
analysis of weighted and directed networks, and (1) determine pervasively
overlapping modules with high resolution; (2) uncover a detailed hierarchical
network structure allowing an efficient, zoom-in analysis of large networks;
(3) allow the determination of key network nodes and (4) help to predict
network dynamics. Conclusions/Significance: The concept opens a wide range of
possibilities to develop new approaches and applications including network
routing, classification, comparison and prediction.Comment: 25 pages with 6 figures and a Glossary + Supporting Information
containing pseudo-codes of all algorithms used, 14 Figures, 5 Tables (with 18
module definitions, 129 different modularization methods, 13 module
comparision methods) and 396 references. All algorithms can be downloaded
from this web-site: http://www.linkgroup.hu/modules.ph
Emerging landscape of oncogenic signatures across human cancers.
Cancer therapy is challenged by the diversity of molecular implementations of oncogenic processes and by the resulting variation in therapeutic responses. Projects such as The Cancer Genome Atlas (TCGA) provide molecular tumor maps in unprecedented detail. The interpretation of these maps remains a major challenge. Here we distilled thousands of genetic and epigenetic features altered in cancers to ∼500 selected functional events (SFEs). Using this simplified description, we derived a hierarchical classification of 3,299 TCGA tumors from 12 cancer types. The top classes are dominated by either mutations (M class) or copy number changes (C class). This distinction is clearest at the extremes of genomic instability, indicating the presence of different oncogenic processes. The full hierarchy shows functional event patterns characteristic of multiple cross-tissue groups of tumors, termed oncogenic signature classes. Targetable functional events in a tumor class are suggestive of class-specific combination therapy. These results may assist in the definition of clinical trials to match actionable oncogenic signatures with personalized therapies
- …