2,256 research outputs found

    Parametric inference of recombination in HIV genomes

    Full text link
    Recombination is an important event in the evolution of HIV. It affects the global spread of the pandemic as well as evolutionary escape from host immune response and from drug therapy within single patients. Comprehensive computational methods are needed for detecting recombinant sequences in large databases, and for inferring the parental sequences. We present a hidden Markov model to annotate a query sequence as a recombinant of a given set of aligned sequences. Parametric inference is used to determine all optimal annotations for all parameters of the model. We show that the inferred annotations recover most features of established hand-curated annotations. Thus, parametric analysis of the hidden Markov model is feasible for HIV full-length genomes, and it improves the detection and annotation of recombinant forms. All computational results, reference alignments, and C++ source code are available at http://bio.math.berkeley.edu/recombination/.Comment: 20 pages, 5 figure

    Hurricane Forecasting: A Novel Multimodal Machine Learning Framework

    Full text link
    This paper describes a machine learning (ML) framework for tropical cyclone intensity and track forecasting, combining multiple distinct ML techniques and utilizing diverse data sources. Our framework, which we refer to as Hurricast (HURR), is built upon the combination of distinct data processing techniques using gradient-boosted trees and novel encoder-decoder architectures, including CNN, GRU and Transformers components. We propose a deep-feature extractor methodology to mix spatial-temporal data with statistical data efficiently. Our multimodal framework unleashes the potential of making forecasts based on a wide range of data sources, including historical storm data, and visual data such as reanalysis atmospheric images. We evaluate our models with current operational forecasts in North Atlantic and Eastern Pacific basins on 2016-2019 for 24-hour lead time, and show our models consistently outperform statistical-dynamical models and compete with the best dynamical models, while computing forecasts in seconds. Furthermore, the inclusion of Hurricast into an operational forecast consensus model leads to a significant improvement of 5% - 15% over NHC's official forecast, thus highlighting the complementary properties with existing approaches. In summary, our work demonstrates that combining different data sources and distinct machine learning methodologies can lead to superior tropical cyclone forecasting. We hope that this work opens the door for further use of machine learning in meteorological forecasting.Comment: Under revision by the AMS' Weather and Forecasting journa

    Distributing the Kalman Filter for Large-Scale Systems

    Full text link
    This paper derives a \emph{distributed} Kalman filter to estimate a sparsely connected, large-scale, nn-dimensional, dynamical system monitored by a network of NN sensors. Local Kalman filters are implemented on the (nln_l-dimensional, where nlnn_l\ll n) sub-systems that are obtained after spatially decomposing the large-scale system. The resulting sub-systems overlap, which along with an assimilation procedure on the local Kalman filters, preserve an LLth order Gauss-Markovian structure of the centralized error processes. The information loss due to the LLth order Gauss-Markovian approximation is controllable as it can be characterized by a divergence that decreases as LL\uparrow. The order of the approximation, LL, leads to a lower bound on the dimension of the sub-systems, hence, providing a criterion for sub-system selection. The assimilation procedure is carried out on the local error covariances with a distributed iterate collapse inversion (DICI) algorithm that we introduce. The DICI algorithm computes the (approximated) centralized Riccati and Lyapunov equations iteratively with only local communication and low-order computation. We fuse the observations that are common among the local Kalman filters using bipartite fusion graphs and consensus averaging algorithms. The proposed algorithm achieves full distribution of the Kalman filter that is coherent with the centralized Kalman filter with an LLth order Gaussian-Markovian structure on the centralized error processes. Nowhere storage, communication, or computation of nn-dimensional vectors and matrices is needed; only nlnn_l \ll n dimensional vectors and matrices are communicated or used in the computation at the sensors

    Faster all-pairs shortest paths via circuit complexity

    Full text link
    We present a new randomized method for computing the min-plus product (a.k.a., tropical product) of two n×nn \times n matrices, yielding a faster algorithm for solving the all-pairs shortest path problem (APSP) in dense nn-node directed graphs with arbitrary edge weights. On the real RAM, where additions and comparisons of reals are unit cost (but all other operations have typical logarithmic cost), the algorithm runs in time n32Ω(logn)1/2\frac{n^3}{2^{\Omega(\log n)^{1/2}}} and is correct with high probability. On the word RAM, the algorithm runs in n3/2Ω(logn)1/2+n2+o(1)logMn^3/2^{\Omega(\log n)^{1/2}} + n^{2+o(1)}\log M time for edge weights in ([0,M]Z){}([0,M] \cap {\mathbb Z})\cup\{\infty\}. Prior algorithms used either n3/(logcn)n^3/(\log^c n) time for various c2c \leq 2, or O(Mαnβ)O(M^{\alpha}n^{\beta}) time for various α>0\alpha > 0 and β>2\beta > 2. The new algorithm applies a tool from circuit complexity, namely the Razborov-Smolensky polynomials for approximately representing AC0[p]{\sf AC}^0[p] circuits, to efficiently reduce a matrix product over the (min,+)(\min,+) algebra to a relatively small number of rectangular matrix products over F2{\mathbb F}_2, each of which are computable using a particularly efficient method due to Coppersmith. We also give a deterministic version of the algorithm running in n3/2logδnn^3/2^{\log^{\delta} n} time for some δ>0\delta > 0, which utilizes the Yao-Beigel-Tarui translation of AC0[m]{\sf AC}^0[m] circuits into "nice" depth-two circuits.Comment: 24 pages. Updated version now has slightly faster running time. To appear in ACM Symposium on Theory of Computing (STOC), 201

    A model reduction method for biochemical reaction networks

    Get PDF
    Background: In this paper we propose a model reduction method for biochemical reaction networks governed by a variety of reversible and irreversible enzyme kinetic rate laws, including reversible Michaelis-Menten and Hill kinetics. The method proceeds by a stepwise reduction in the number of complexes, defined as the left and right-hand sides of the reactions in the network. It is based on the Kron reduction of the weighted Laplacian matrix, which describes the graph structure of the complexes and reactions in the network. It does not rely on prior knowledge of the dynamic behaviour of the network and hence can be automated, as we demonstrate. The reduced network has fewer complexes, reactions, variables and parameters as compared to the original network, and yet the behaviour of a preselected set of significant metabolites in the reduced network resembles that of the original network. Moreover the reduced network largely retains the structure and kinetics of the original model. Results: We apply our method to a yeast glycolysis model and a rat liver fatty acid beta-oxidation model. When the number of state variables in the yeast model is reduced from 12 to 7, the difference between metabolite concentrations in the reduced and the full model, averaged over time and species, is only 8%. Likewise, when the number of state variables in the rat-liver beta-oxidation model is reduced from 42 to 29, the difference between the reduced model and the full model is 7.5%. Conclusions: The method has improved our understanding of the dynamics of the two networks. We found that, contrary to the general disposition, the first few metabolites which were deleted from the network during our stepwise reduction approach, are not those with the shortest convergence times. It shows that our reduction approach performs differently from other approaches that are based on time-scale separation. The method can be used to facilitate fitting of the parameters or to embed a detailed model of interest in a more coarse-grained yet realistic environment
    corecore