187 research outputs found
Simulation Intelligence: Towards a New Generation of Scientific Methods
The original "Seven Motifs" set forth a roadmap of essential methods for the
field of scientific computing, where a motif is an algorithmic method that
captures a pattern of computation and data movement. We present the "Nine
Motifs of Simulation Intelligence", a roadmap for the development and
integration of the essential algorithms necessary for a merger of scientific
computing, scientific simulation, and artificial intelligence. We call this
merger simulation intelligence (SI), for short. We argue the motifs of
simulation intelligence are interconnected and interdependent, much like the
components within the layers of an operating system. Using this metaphor, we
explore the nature of each layer of the simulation intelligence operating
system stack (SI-stack) and the motifs therein: (1) Multi-physics and
multi-scale modeling; (2) Surrogate modeling and emulation; (3)
Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based
modeling; (6) Probabilistic programming; (7) Differentiable programming; (8)
Open-ended optimization; (9) Machine programming. We believe coordinated
efforts between motifs offers immense opportunity to accelerate scientific
discovery, from solving inverse problems in synthetic biology and climate
science, to directing nuclear energy experiments and predicting emergent
behavior in socioeconomic settings. We elaborate on each layer of the SI-stack,
detailing the state-of-art methods, presenting examples to highlight challenges
and opportunities, and advocating for specific ways to advance the motifs and
the synergies from their combinations. Advancing and integrating these
technologies can enable a robust and efficient hypothesis-simulation-analysis
type of scientific method, which we introduce with several use-cases for
human-machine teaming and automated science
Bioinformatics
This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here
Recommended from our members
Statistical methods for the integrative analysis of single-cell multi-omics data
Single-cell profiling techniques have provided an unprecedented opportunity to study cellular heterogeneity at the molecular level. This represents a remarkable advance over traditional bulk sequencing methods, particularly to study lineage diversification and cell fate commitment events in heterogeneous biological processes. While the large majority of single-cell studies are focused on quantifying RNA expression, transcriptomic readouts provide only a single dimension of cellular heterogeneity. Recently, technological advances have enabled multiple biological layers to be probed in parallel one cell at a time, unveiling a powerful approach for investigating multiple dimensions of cellular heterogeneity. However, the increasing availability of multi-modal data sets needs to be accompanied by the development of suitable integrative strategies to fully exploit the data generated. In this thesis I worked in collaboration with different research groups to introduce innovative experimental and computational strategies for the integrative study of multi-omics at single-cell resolution.
The first contribution is the development of scNMT-seq, a protocol for the simultaneous profiling of RNA expression, DNA methylation and chromatin accessibility in single cells. I demonstrate how this assay provides a powerful approach for investigating regulatory relationships between the epigenome and the transcriptome within individual cells.
The second contribution is Multi-Omics Factor Analysis (MOFA), a statistical framework for the unsupervised integration of multi-omics data sets. MOFA is a Bayesian latent variable model that can be viewed as a statistically rigorous generalization of Principal Component Analysis to multi-omics data. The method provides a principled approach to retrieve, in an unsupervised manner, the underlying sources of sample heterogeneity while at the same time disentangling which axes of heterogeneity are shared across multiple modalities and which are specific to individual data modalities.
The third contribution is the generation of a comprehensive molecular roadmap of mouse gastrulation at single-cell resolution. We employed scNMT-seq to simultaneously profile RNA expression, DNA methylation and chromatin accessibility for hundreds of cells, spanning multiple time points from the exit from pluripotency to primary germ layer specification. Using MOFA, and other tools, I performed an integrative analysis of the multi-modal measurements, revealing novel insights into the role of the epigenome in regulating this key developmental process.
The fourth contribution is an extended formulation of the MOFA model tailored to the analysis of large-scale single-cell data with complex experimental designs. I extended the model to incorporate a flexible regularisation that enables the joint analysis of multiple omics as well as multiple sample groups (batches and/or experimental conditions). In addition, I implemented a GPU-accelerated stochastic variational inference framework, thus enabling the scalable analysis of potentially millions of samples
CORNN: Convex optimization of recurrent neural networks for rapid inference of neural dynamics
Advances in optical and electrophysiological recording technologies have made
it possible to record the dynamics of thousands of neurons, opening up new
possibilities for interpreting and controlling large neural populations in
behaving animals. A promising way to extract computational principles from
these large datasets is to train data-constrained recurrent neural networks
(dRNNs). Performing this training in real-time could open doors for research
techniques and medical applications to model and control interventions at
single-cell resolution and drive desired forms of animal behavior. However,
existing training algorithms for dRNNs are inefficient and have limited
scalability, making it a challenge to analyze large neural recordings even in
offline scenarios. To address these issues, we introduce a training method
termed Convex Optimization of Recurrent Neural Networks (CORNN). In studies of
simulated recordings, CORNN attained training speeds ~100-fold faster than
traditional optimization approaches while maintaining or enhancing modeling
accuracy. We further validated CORNN on simulations with thousands of cells
that performed simple computations such as those of a 3-bit flip-flop or the
execution of a timed response. Finally, we showed that CORNN can robustly
reproduce network dynamics and underlying attractor structures despite
mismatches between generator and inference models, severe subsampling of
observed neurons, or mismatches in neural time-scales. Overall, by training
dRNNs with millions of parameters in subminute processing times on a standard
computer, CORNN constitutes a first step towards real-time network reproduction
constrained on large-scale neural recordings and a powerful computational tool
for advancing the understanding of neural computation.Comment: Accepted at NeurIPS 202
Recent Advances of Deep Learning in Bioinformatics and Computational Biology
Extracting inherent valuable knowledge from omics big data remains as a daunting problem in bioinformatics and computational biology. Deep learning, as an emerging branch from machine learning, has exhibited unprecedented performance in quite a few applications from academia and industry. We highlight the difference and similarity in widely utilized models in deep learning studies, through discussing their basic structures, and reviewing diverse applications and disadvantages. We anticipate the work can serve as a meaningful perspective for further development of its theory, algorithm and application in bioinformatic and computational biology
Generative Model based Training of Deep Neural Networks for Event Detection in Microscopy Data
Several imaging techniques employed in the life sciences heavily rely on machine learning methods
to make sense of the data that they produce. These include calcium imaging and multi-electrode
recordings of neural activity, single molecule localization microscopy, spatially-resolved transcriptomics and particle tracking, among others. All of them only produce indirect readouts of the
spatiotemporal events they aim to record. The objective when analysing data from these methods
is the identification of patterns that indicate the location of the sought-after events, e.g. spikes in
neural recordings or fluorescent particles in microscopy data.
Existing approaches for this task invert a forward model, i.e. a mathematical description of the
process that generates the observed patterns for a given set of underlying events, using established
methods like MCMC or variational inference. Perhaps surprisingly, for a long time deep learning
saw little use in this domain, even though it became the dominant approach in the field of pattern
recognition over the previous decade. The principal reason is that in the absence of labeled data
needed for supervised optimization it remains unclear how neural networks can be trained to solve
these tasks. To unlock the potential of deep learning, this thesis proposes different methods for
training neural networks using forward models and without relying on labeled data. The thesis
rests on two publications:
In the first publication we introduce an algorithm for spike extraction from calcium imaging
time traces. Building on the variational autoencoder framework, we simultaneously train a neural
network that performs spike inference and optimize the parameters of the forward model. This
approach combines several advantages that were previously incongruous: it is fast at test-time,
can be applied to different non-linear forward models and produces samples from the posterior
distribution over spike trains.
The second publication deals with the localization of fluorescent particles in single molecule
localization microscopy. We show that an accurate forward model can be used to generate simulations that act as a surrogate for labeled training data. Careful design of the output representation
and loss function result in a method with outstanding precision across experimental designs and
imaging conditions.
Overall this thesis highlights how neural networks can be applied for precise, fast and flexible model inversion on this class of problems and how this opens up new avenues to achieve
performance beyond what was previously possible
- …