30,532 research outputs found
Link-Prediction Enhanced Consensus Clustering for Complex Networks
Many real networks that are inferred or collected from data are incomplete
due to missing edges. Missing edges can be inherent to the dataset (Facebook
friend links will never be complete) or the result of sampling (one may only
have access to a portion of the data). The consequence is that downstream
analyses that consume the network will often yield less accurate results than
if the edges were complete. Community detection algorithms, in particular,
often suffer when critical intra-community edges are missing. We propose a
novel consensus clustering algorithm to enhance community detection on
incomplete networks. Our framework utilizes existing community detection
algorithms that process networks imputed by our link prediction based
algorithm. The framework then merges their multiple outputs into a final
consensus output. On average our method boosts performance of existing
algorithms by 7% on artificial data and 17% on ego networks collected from
Facebook
Enhanced reconstruction of weighted networks from strengths and degrees
Network topology plays a key role in many phenomena, from the spreading of
diseases to that of financial crises. Whenever the whole structure of a network
is unknown, one must resort to reconstruction methods that identify the least
biased ensemble of networks consistent with the partial information available.
A challenging case, frequently encountered due to privacy issues in the
analysis of interbank flows and Big Data, is when there is only local
(node-specific) aggregate information available. For binary networks, the
relevant ensemble is one where the degree (number of links) of each node is
constrained to its observed value. However, for weighted networks the problem
is much more complicated. While the naive approach prescribes to constrain the
strengths (total link weights) of all nodes, recent counter-intuitive results
suggest that in weighted networks the degrees are often more informative than
the strengths. This implies that the reconstruction of weighted networks would
be significantly enhanced by the specification of both strengths and degrees, a
computationally hard and bias-prone procedure. Here we solve this problem by
introducing an analytical and unbiased maximum-entropy method that works in the
shortest possible time and does not require the explicit generation of
reconstructed samples. We consider several real-world examples and show that,
while the strengths alone give poor results, the additional knowledge of the
degrees yields accurately reconstructed networks. Information-theoretic
criteria rigorously confirm that the degree sequence, as soon as it is
non-trivial, is irreducible to the strength sequence. Our results have strong
implications for the analysis of motifs and communities and whenever the
reconstructed ensemble is required as a null model to detect higher-order
patterns
Model-free reconstruction of neuronal network connectivity from calcium imaging signals
A systematic assessment of global neural network connectivity through direct
electrophysiological assays has remained technically unfeasible even in
dissociated neuronal cultures. We introduce an improved algorithmic approach
based on Transfer Entropy to reconstruct approximations to network structural
connectivities from network activity monitored through calcium fluorescence
imaging. Based on information theory, our method requires no prior assumptions
on the statistics of neuronal firing and neuronal connections. The performance
of our algorithm is benchmarked on surrogate time-series of calcium
fluorescence generated by the simulated dynamics of a network with known
ground-truth topology. We find that the effective network topology revealed by
Transfer Entropy depends qualitatively on the time-dependent dynamic state of
the network (e.g., bursting or non-bursting). We thus demonstrate how
conditioning with respect to the global mean activity improves the performance
of our method. [...] Compared to other reconstruction strategies such as
cross-correlation or Granger Causality methods, our method based on improved
Transfer Entropy is remarkably more accurate. In particular, it provides a good
reconstruction of the network clustering coefficient, allowing to discriminate
between weakly or strongly clustered topologies, whereas on the other hand an
approach based on cross-correlations would invariantly detect artificially high
levels of clustering. Finally, we present the applicability of our method to
real recordings of in vitro cortical cultures. We demonstrate that these
networks are characterized by an elevated level of clustering compared to a
random graph (although not extreme) and by a markedly non-local connectivity.Comment: 54 pages, 8 figures (+9 supplementary figures), 1 table; submitted
for publicatio
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles
Cluster ensembles have emerged as a powerful meta-learning paradigm that provides improved accuracy and robustness by aggregating several input data clusterings. In particular, link-based similarity methods have recently been introduced with superior performance to the conventional co-association approach. This paper presents a MATLAB package, LinkCluE, that implements the link-based cluster ensemble framework. A variety of functional methods for evaluating clustering results, based on both internal and external criteria, are also provided. Additionally, the underlying algorithms together with the sample uses of the package with interesting real and synthetic datasets are demonstrated herein.
A GDP-driven model for the binary and weighted structure of the International Trade Network
Recent events such as the global financial crisis have renewed the interest
in the topic of economic networks. One of the main channels of shock
propagation among countries is the International Trade Network (ITN). Two
important models for the ITN structure, the classical gravity model of trade
(more popular among economists) and the fitness model (more popular among
networks scientists), are both limited to the characterization of only one
representation of the ITN. The gravity model satisfactorily predicts the volume
of trade between connected countries, but cannot reproduce the observed missing
links (i.e. the topology). On the other hand, the fitness model can
successfully replicate the topology of the ITN, but cannot predict the volumes.
This paper tries to make an important step forward in the unification of those
two frameworks, by proposing a new GDP-driven model which can simultaneously
reproduce the binary and the weighted properties of the ITN. Specifically, we
adopt a maximum-entropy approach where both the degree and the strength of each
node is preserved. We then identify strong nonlinear relationships between the
GDP and the parameters of the model. This ultimately results in a weighted
generalization of the fitness model of trade, where the GDP plays the role of a
`macroeconomic fitness' shaping the binary and the weighted structure of the
ITN simultaneously. Our model mathematically highlights an important asymmetry
in the role of binary and weighted network properties, namely the fact that
binary properties can be inferred without the knowledge of weighted ones, while
the opposite is not true
A Method to Improve the Analysis of Cluster Ensembles
Clustering is fundamental to understand the structure of data. In the past decade the cluster ensembleproblem has been introduced, which combines a set of partitions (an ensemble) of the data to obtain a singleconsensus solution that outperforms all the ensemble members. However, there is disagreement about which arethe best ensemble characteristics to obtain a good performance: some authors have suggested that highly differentpartitions within the ensemble are beneï¬ cial for the ï¬ nal performance, whereas others have stated that mediumdiversity among them is better. While there are several measures to quantify the diversity, a better method toanalyze the best ensemble characteristics is necessary. This paper introduces a new ensemble generation strategyand a method to make slight changes in its structure. Experimental results on six datasets suggest that this isan important step towards a more systematic approach to analyze the impact of the ensemble characteristics onthe overall consensus performance.Fil: Pividori, Milton Damián. Universidad Tecnologica Nacional. Facultad Regional Santa Fe. Centro de Investigacion y Desarrollo de Ingenieria en Sistemas de Informacion; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina. Universidad Tecnologica Nacional. Facultad Regional Santa Fe. Centro de Investigacion y Desarrollo de Ingenieria en Sistemas de Informacion; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin
A Socio-Informatic Approach to Automated Account Classification on Social Media
Automated accounts on social media have become increasingly problematic. We
propose a key feature in combination with existing methods to improve machine
learning algorithms for bot detection. We successfully improve classification
performance through including the proposed feature.Comment: International Conference on Social Media and Societ
- …