947 research outputs found
Multiplex Decomposition of Non-Markovian Dynamics and the Hidden Layer Reconstruction Problem
Elements composing complex systems usually interact in several different ways
and as such the interaction architecture is well modelled by a multiplex
network. However often this architecture is hidden, as one usually only has
experimental access to an aggregated projection. A fundamental challenge is
thus to determine whether the hidden underlying architecture of complex systems
is better modelled as a single interaction layer or results from the
aggregation and interplay of multiple layers. Here we show that using local
information provided by a random walker navigating the aggregated network one
can decide in a robust way if the underlying structure is a multiplex or not
and, in the former case, to determine the most probable number of hidden
layers. As a byproduct, we show that the mathematical formalism also provides a
principled solution for the optimal decomposition and projection of complex,
non-Markovian dynamics into a Markov switching combination of diffusive modes.
We validate the proposed methodology with numerical simulations of both (i)
random walks navigating hidden multiplex networks (thereby reconstructing the
true hidden architecture) and (ii) Markovian and non-Markovian continuous
stochastic processes (thereby reconstructing an effective multiplex
decomposition where each layer accounts for a different diffusive mode). We
also state and prove two existence theorems guaranteeing that an exact
reconstruction of the dynamics in terms of these hidden jump-Markov models is
always possible for arbitrary finite-order Markovian and fully non-Markovian
processes. Finally, we showcase the applicability of the method to experimental
recordings from (i) the mobility dynamics of human players in an online
multiplayer game and (ii) the dynamics of RNA polymerases at the
single-molecule level.Comment: 40 pages, 24 figure
X-CNN: Cross-modal Convolutional Neural Networks for Sparse Datasets
In this paper we propose cross-modal convolutional neural networks (X-CNNs),
a novel biologically inspired type of CNN architectures, treating gradient
descent-specialised CNNs as individual units of processing in a larger-scale
network topology, while allowing for unconstrained information flow and/or
weight sharing between analogous hidden layers of the network---thus
generalising the already well-established concept of neural network ensembles
(where information typically may flow only between the output layers of the
individual networks). The constituent networks are individually designed to
learn the output function on their own subset of the input data, after which
cross-connections between them are introduced after each pooling operation to
periodically allow for information exchange between them. This injection of
knowledge into a model (by prior partition of the input data through domain
knowledge or unsupervised methods) is expected to yield greatest returns in
sparse data environments, which are typically less suitable for training CNNs.
For evaluation purposes, we have compared a standard four-layer CNN as well as
a sophisticated FitNet4 architecture against their cross-modal variants on the
CIFAR-10 and CIFAR-100 datasets with differing percentages of the training data
being removed, and find that at lower levels of data availability, the X-CNNs
significantly outperform their baselines (typically providing a 2--6% benefit,
depending on the dataset size and whether data augmentation is used), while
still maintaining an edge on all of the full dataset tests.Comment: To appear in the 7th IEEE Symposium Series on Computational
Intelligence (IEEE SSCI 2016), 8 pages, 6 figures. Minor revisions, in
response to reviewers' comment
Recommended from our members
PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures.
Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions
Deep Learning in Single-Cell Analysis
Single-cell technologies are revolutionizing the entire field of biology. The
large volumes of data generated by single-cell technologies are
high-dimensional, sparse, heterogeneous, and have complicated dependency
structures, making analyses using conventional machine learning approaches
challenging and impractical. In tackling these challenges, deep learning often
demonstrates superior performance compared to traditional machine learning
methods. In this work, we give a comprehensive survey on deep learning in
single-cell analysis. We first introduce background on single-cell technologies
and their development, as well as fundamental concepts of deep learning
including the most popular deep architectures. We present an overview of the
single-cell analytic pipeline pursued in research applications while noting
divergences due to data sources or specific applications. We then review seven
popular tasks spanning through different stages of the single-cell analysis
pipeline, including multimodal integration, imputation, clustering, spatial
domain identification, cell-type deconvolution, cell segmentation, and
cell-type annotation. Under each task, we describe the most recent developments
in classical and deep learning methods and discuss their advantages and
disadvantages. Deep learning tools and benchmark datasets are also summarized
for each task. Finally, we discuss the future directions and the most recent
challenges. This survey will serve as a reference for biologists and computer
scientists, encouraging collaborations.Comment: 77 pages, 11 figures, 15 tables, deep learning, single-cell analysi
Algebraic Statistics in Practice: Applications to Networks
Algebraic statistics uses tools from algebra (especially from multilinear
algebra, commutative algebra and computational algebra), geometry and
combinatorics to provide insight into knotty problems in mathematical
statistics. In this survey we illustrate this on three problems related to
networks, namely network models for relational data, causal structure discovery
and phylogenetics. For each problem we give an overview of recent results in
algebraic statistics with emphasis on the statistical achievements made
possible by these tools and their practical relevance for applications to other
scientific disciplines
Sparse graphical models for cancer signalling
Protein signalling networks play a key role in cellular function, and their dysregulation is central to many diseases, including cancer. Recent advances in biochemical technology have begun to allow high-throughput, data-driven studies of signalling. In this thesis, we investigate multivariate statistical methods, rooted in sparse graphical models, aimed at probing questions in cancer signalling.
First, we propose a Bayesian variable selection method for identifying subsets of proteins that jointly in uence an output of interest, such as drug response. Ancillary biological information is incorporated into inference using informative prior distributions. Prior information is selected and weighted in an automated manner using an empirical Bayes formulation. We present examples of informative pathway and network-based priors, and illustrate the proposed method on both synthetic and drug response data.
Second, we use dynamic Bayesian networks to perform structure learning of context-specific signalling network topology from proteomic time-course data. We exploit a connection between variable selection and network structure learning to efficiently carry out exact inference. Existing biology is incorporated using informative network priors, weighted automatically by an empirical Bayes approach. The overall approach is computationally efficient and essentially free of user-set parameters.
We show results from an empirical investigation, comparing the approach to several existing methods, and from an application to breast cancer cell line data. Hypotheses are generated regarding novel signalling links, some of which are validated by independent experiments.
Third, we describe a network-based clustering approach for the discovery of cancer subtypes that differ in terms of subtype-specific signalling network structure.
Model-based clustering is combined with penalised likelihood estimation of undirected graphical models to allow simultaneous learning of cluster assignments and cluster-specific network structure. Results are shown from an empirical investigation comparing several penalisation regimes, and an application to breast cancer proteomic data
CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS
The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research
- …