2,256 research outputs found
Parametric inference of recombination in HIV genomes
Recombination is an important event in the evolution of HIV. It affects the
global spread of the pandemic as well as evolutionary escape from host immune
response and from drug therapy within single patients. Comprehensive
computational methods are needed for detecting recombinant sequences in large
databases, and for inferring the parental sequences.
We present a hidden Markov model to annotate a query sequence as a
recombinant of a given set of aligned sequences. Parametric inference is used
to determine all optimal annotations for all parameters of the model. We show
that the inferred annotations recover most features of established hand-curated
annotations. Thus, parametric analysis of the hidden Markov model is feasible
for HIV full-length genomes, and it improves the detection and annotation of
recombinant forms.
All computational results, reference alignments, and C++ source code are
available at http://bio.math.berkeley.edu/recombination/.Comment: 20 pages, 5 figure
Hurricane Forecasting: A Novel Multimodal Machine Learning Framework
This paper describes a machine learning (ML) framework for tropical cyclone
intensity and track forecasting, combining multiple distinct ML techniques and
utilizing diverse data sources. Our framework, which we refer to as Hurricast
(HURR), is built upon the combination of distinct data processing techniques
using gradient-boosted trees and novel encoder-decoder architectures, including
CNN, GRU and Transformers components. We propose a deep-feature extractor
methodology to mix spatial-temporal data with statistical data efficiently. Our
multimodal framework unleashes the potential of making forecasts based on a
wide range of data sources, including historical storm data, and visual data
such as reanalysis atmospheric images. We evaluate our models with current
operational forecasts in North Atlantic and Eastern Pacific basins on 2016-2019
for 24-hour lead time, and show our models consistently outperform
statistical-dynamical models and compete with the best dynamical models, while
computing forecasts in seconds. Furthermore, the inclusion of Hurricast into an
operational forecast consensus model leads to a significant improvement of 5% -
15% over NHC's official forecast, thus highlighting the complementary
properties with existing approaches. In summary, our work demonstrates that
combining different data sources and distinct machine learning methodologies
can lead to superior tropical cyclone forecasting. We hope that this work opens
the door for further use of machine learning in meteorological forecasting.Comment: Under revision by the AMS' Weather and Forecasting journa
Distributing the Kalman Filter for Large-Scale Systems
This paper derives a \emph{distributed} Kalman filter to estimate a sparsely
connected, large-scale, dimensional, dynamical system monitored by a
network of sensors. Local Kalman filters are implemented on the
(dimensional, where ) sub-systems that are obtained after
spatially decomposing the large-scale system. The resulting sub-systems
overlap, which along with an assimilation procedure on the local Kalman
filters, preserve an th order Gauss-Markovian structure of the centralized
error processes. The information loss due to the th order Gauss-Markovian
approximation is controllable as it can be characterized by a divergence that
decreases as . The order of the approximation, , leads to a lower
bound on the dimension of the sub-systems, hence, providing a criterion for
sub-system selection. The assimilation procedure is carried out on the local
error covariances with a distributed iterate collapse inversion (DICI)
algorithm that we introduce. The DICI algorithm computes the (approximated)
centralized Riccati and Lyapunov equations iteratively with only local
communication and low-order computation. We fuse the observations that are
common among the local Kalman filters using bipartite fusion graphs and
consensus averaging algorithms. The proposed algorithm achieves full
distribution of the Kalman filter that is coherent with the centralized Kalman
filter with an th order Gaussian-Markovian structure on the centralized
error processes. Nowhere storage, communication, or computation of
dimensional vectors and matrices is needed; only dimensional
vectors and matrices are communicated or used in the computation at the
sensors
Faster all-pairs shortest paths via circuit complexity
We present a new randomized method for computing the min-plus product
(a.k.a., tropical product) of two matrices, yielding a faster
algorithm for solving the all-pairs shortest path problem (APSP) in dense
-node directed graphs with arbitrary edge weights. On the real RAM, where
additions and comparisons of reals are unit cost (but all other operations have
typical logarithmic cost), the algorithm runs in time
and is correct with high probability.
On the word RAM, the algorithm runs in time for edge weights in . Prior algorithms used either time for
various , or time for various
and .
The new algorithm applies a tool from circuit complexity, namely the
Razborov-Smolensky polynomials for approximately representing
circuits, to efficiently reduce a matrix product over the algebra to
a relatively small number of rectangular matrix products over ,
each of which are computable using a particularly efficient method due to
Coppersmith. We also give a deterministic version of the algorithm running in
time for some , which utilizes the
Yao-Beigel-Tarui translation of circuits into "nice" depth-two
circuits.Comment: 24 pages. Updated version now has slightly faster running time. To
appear in ACM Symposium on Theory of Computing (STOC), 201
A model reduction method for biochemical reaction networks
Background: In this paper we propose a model reduction method for biochemical reaction networks governed by a variety of reversible and irreversible enzyme kinetic rate laws, including reversible Michaelis-Menten and Hill kinetics. The method proceeds by a stepwise reduction in the number of complexes, defined as the left and right-hand sides of the reactions in the network. It is based on the Kron reduction of the weighted Laplacian matrix, which describes the graph structure of the complexes and reactions in the network. It does not rely on prior knowledge of the dynamic behaviour of the network and hence can be automated, as we demonstrate. The reduced network has fewer complexes, reactions, variables and parameters as compared to the original network, and yet the behaviour of a preselected set of significant metabolites in the reduced network resembles that of the original network. Moreover the reduced network largely retains the structure and kinetics of the original model.
Results: We apply our method to a yeast glycolysis model and a rat liver fatty acid beta-oxidation model. When the number of state variables in the yeast model is reduced from 12 to 7, the difference between metabolite concentrations in the reduced and the full model, averaged over time and species, is only 8%. Likewise, when the number of state variables in the rat-liver beta-oxidation model is reduced from 42 to 29, the difference between the reduced model and the full model is 7.5%.
Conclusions: The method has improved our understanding of the dynamics of the two networks. We found that, contrary to the general disposition, the first few metabolites which were deleted from the network during our stepwise reduction approach, are not those with the shortest convergence times. It shows that our reduction approach performs differently from other approaches that are based on time-scale separation. The method can be used to facilitate fitting of the parameters or to embed a detailed model of interest in a more coarse-grained yet realistic environment
Recommended from our members
Network and Algebraic Topology of Influenza Evolution
Evolution is a force that has molded human existence since its divergence from chimpanzees about 5.4 million years ago. In that same amount of time, an influenza virus, which replicates every six hours, would have undergone an equivalent number of generations over only a hundred years. The fast replication times of influenza, coupled with its high mutation rate, make the virus a perfect model to study real-time evolution at a mega-Darwin scale, more than a million times faster than human evolution. While recent developments in high-throughput sequencing provide an optimal opportunity to dissect their genetic evolution, a concurrent growth in computational tools is necessary to analyze the large influx of complex genomic data. In my thesis, I present novel computational methods to examine different aspects of influenza evolution.
I first focus on seasonal influenza, particularly the problems that hamper public health initiatives to combat the virus. I introduce two new approaches: 1. The q2-coefficient, a method of quantifying pathogen surveillance, and 2. FluGraph, a technique that employs network topology to track the spread of seasonal influenza around the world.
The second chapter of my thesis examines how mutations and reassortment combine to alter the course of influenza evolution towards pandemic formation. I highlight inherent deficiencies in the current phylogenetic paradigm for analyzing evolution and offer a novel methodology based on algebraic topology that comprehensively reconstructs both vertical and horizontal evolutionary events. I apply this method to viruses, with emphasis on influenza, but foresee broader application to cancer cells, bacteria, eukaryotes, and other taxa
- …