23,117 research outputs found
Median evidential c-means algorithm and its application to community detection
Median clustering is of great value for partitioning relational data. In this
paper, a new prototype-based clustering method, called Median Evidential
C-Means (MECM), which is an extension of median c-means and median fuzzy
c-means on the theoretical framework of belief functions is proposed. The
median variant relaxes the restriction of a metric space embedding for the
objects but constrains the prototypes to be in the original data set. Due to
these properties, MECM could be applied to graph clustering problems. A
community detection scheme for social networks based on MECM is investigated
and the obtained credal partitions of graphs, which are more refined than crisp
and fuzzy ones, enable us to have a better understanding of the graph
structures. An initial prototype-selection scheme based on evidential
semi-centrality is presented to avoid local premature convergence and an
evidential modularity function is defined to choose the optimal number of
communities. Finally, experiments in synthetic and real data sets illustrate
the performance of MECM and show its difference to other methods
Regional surname affinity: a spatial network approach
OBJECTIVE
We investigate surname affinities among areas of modernâday China, by constructing a spatial network, and making community detection. It reports a geographical genealogy of the Chinese population that is result of population origins, historical migrations, and societal evolutions.
MATERIALS AND METHODS
We acquire data from the census records supplied by China's National Citizen Identity Information System, including the surname and regional information of 1.28 billion registered Chinese citizens. We propose a multilayer minimum spanning tree (MMST) to construct a spatial network based on the matrix of isonymic distances, which is often used to characterize the dissimilarity of surname structure among areas. We use the fast unfolding algorithm to detect network communities.
RESULTS
We obtain a 10âlayer MMST network of 362 prefecture nodes and 3,610 edges derived from the matrix of the Euclidean distances among these areas. These prefectures are divided into eight groups in the spatial network via community detection. We measure the partition by comparing the interâdistances and intraâdistances of the communities and obtain meaningful regional ethnicity classification.
DISCUSSION
The visualization of the resulting communities on the map indicates that the prefectures in the same community are usually geographically adjacent. The formation of this partition is influenced by geographical factors, historic migrations, trade and economic factors, as well as isolation of culture and language. The MMST algorithm proves to be effective in geoâgenealogy and ethnicity classification for it retains essential information about surname affinity and highlights the geographical consanguinity of the population.National Natural Science Foundation of China, Grant/Award Numbers: 61773069, 71731002; National Social Science Foundation of China, Grant/Award Number: 14BSH024; Foundation of China of China Scholarships Council, Grant/Award Numbers: 201606045048, 201706040188, 201706040015; DOE, Grant/Award Number: DE-AC07-05Id14517; DTRA, Grant/Award Number: HDTRA1-14-1-0017; NSF, Grant/Award Numbers: CHE-1213217, CMMI-1125290, PHY-1505000 (61773069 - National Natural Science Foundation of China; 71731002 - National Natural Science Foundation of China; 14BSH024 - National Social Science Foundation of China; 201606045048 - Foundation of China of China Scholarships Council; 201706040188 - Foundation of China of China Scholarships Council; 201706040015 - Foundation of China of China Scholarships Council; DE-AC07-05Id14517 - DOE; HDTRA1-14-1-0017 - DTRA; CHE-1213217 - NSF; CMMI-1125290 - NSF; PHY-1505000 - NSF)Published versio
Bayesian nonparametric dependent model for partially replicated data: the influence of fuel spills on species diversity
We introduce a dependent Bayesian nonparametric model for the probabilistic
modeling of membership of subgroups in a community based on partially
replicated data. The focus here is on species-by-site data, i.e. community data
where observations at different sites are classified in distinct species. Our
aim is to study the impact of additional covariates, for instance environmental
variables, on the data structure, and in particular on the community diversity.
To that purpose, we introduce dependence a priori across the covariates, and
show that it improves posterior inference. We use a dependent version of the
Griffiths-Engen-McCloskey distribution defined via the stick-breaking
construction. This distribution is obtained by transforming a Gaussian process
whose covariance function controls the desired dependence. The resulting
posterior distribution is sampled by Markov chain Monte Carlo. We illustrate
the application of our model to a soil microbial dataset acquired across a
hydrocarbon contamination gradient at the site of a fuel spill in Antarctica.
This method allows for inference on a number of quantities of interest in
ecotoxicology, such as diversity or effective concentrations, and is broadly
applicable to the general problem of communities response to environmental
variables.Comment: Main Paper: 22 pages, 6 figures. Supplementary Material: 11 pages, 1
figur
- âŠ