Search CORE

2,256 research outputs found

Parametric inference of recombination in HIV genomes

Author: Beerenwinkel Niko
Dewey Colin N.
Woods Kevin M.
Publication venue
Publication date: 01/01/2005
Field of study

Recombination is an important event in the evolution of HIV. It affects the global spread of the pandemic as well as evolutionary escape from host immune response and from drug therapy within single patients. Comprehensive computational methods are needed for detecting recombinant sequences in large databases, and for inferring the parental sequences. We present a hidden Markov model to annotate a query sequence as a recombinant of a given set of aligned sequences. Parametric inference is used to determine all optimal annotations for all parameters of the model. We show that the inferred annotations recover most features of established hand-curated annotations. Thus, parametric analysis of the hidden Markov model is feasible for HIV full-length genomes, and it improves the detection and annotation of recombinant forms. All computational results, reference alignments, and C++ source code are available at http://bio.math.berkeley.edu/recombination/.Comment: 20 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Hurricane Forecasting: A Novel Multimodal Machine Learning Framework

Author: Bertsimas Dimitris
Boussioux Léonard
Guénais Théo
Zeng Cynthia
Publication venue
Publication date: 11/06/2021
Field of study

This paper describes a machine learning (ML) framework for tropical cyclone intensity and track forecasting, combining multiple distinct ML techniques and utilizing diverse data sources. Our framework, which we refer to as Hurricast (HURR), is built upon the combination of distinct data processing techniques using gradient-boosted trees and novel encoder-decoder architectures, including CNN, GRU and Transformers components. We propose a deep-feature extractor methodology to mix spatial-temporal data with statistical data efficiently. Our multimodal framework unleashes the potential of making forecasts based on a wide range of data sources, including historical storm data, and visual data such as reanalysis atmospheric images. We evaluate our models with current operational forecasts in North Atlantic and Eastern Pacific basins on 2016-2019 for 24-hour lead time, and show our models consistently outperform statistical-dynamical models and compete with the best dynamical models, while computing forecasts in seconds. Furthermore, the inclusion of Hurricast into an operational forecast consensus model leads to a significant improvement of 5% - 15% over NHC's official forecast, thus highlighting the complementary properties with existing approaches. In summary, our work demonstrates that combining different data sources and distinct machine learning methodologies can lead to superior tropical cyclone forecasting. We hope that this work opens the door for further use of machine learning in meteorological forecasting.Comment: Under revision by the AMS' Weather and Forecasting journa

arXiv.org e-Print Archive

Distributing the Kalman Filter for Large-Scale Systems

Author: Khan Usman A.
Moura Jose M. F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/02/2008
Field of study

This paper derives a \emph{distributed} Kalman filter to estimate a sparsely connected, large-scale,

n-

dimensional, dynamical system monitored by a network of

N

sensors. Local Kalman filters are implemented on the (

n_l-

dimensional, where

n_l\ll n

) sub-systems that are obtained after spatially decomposing the large-scale system. The resulting sub-systems overlap, which along with an assimilation procedure on the local Kalman filters, preserve an

L

th order Gauss-Markovian structure of the centralized error processes. The information loss due to the

L

th order Gauss-Markovian approximation is controllable as it can be characterized by a divergence that decreases as

L\uparrow

. The order of the approximation,

L

, leads to a lower bound on the dimension of the sub-systems, hence, providing a criterion for sub-system selection. The assimilation procedure is carried out on the local error covariances with a distributed iterate collapse inversion (DICI) algorithm that we introduce. The DICI algorithm computes the (approximated) centralized Riccati and Lyapunov equations iteratively with only local communication and low-order computation. We fuse the observations that are common among the local Kalman filters using bipartite fusion graphs and consensus averaging algorithms. The proposed algorithm achieves full distribution of the Kalman filter that is coherent with the centralized Kalman filter with an

L

th order Gaussian-Markovian structure on the centralized error processes. Nowhere storage, communication, or computation of

n-

dimensional vectors and matrices is needed; only

n_l \ll n

dimensional vectors and matrices are communicated or used in the computation at the sensors

arXiv.org e-Print Archive

Crossref

Faster all-pairs shortest paths via circuit complexity

Author: Aho Alfred V.
Ballard Grey
Bremner David
Kerr Leslie R.
Publication venue
Publication date: 21/05/2014
Field of study

We present a new randomized method for computing the min-plus product (a.k.a., tropical product) of two

n \times n

matrices, yielding a faster algorithm for solving the all-pairs shortest path problem (APSP) in dense

n

-node directed graphs with arbitrary edge weights. On the real RAM, where additions and comparisons of reals are unit cost (but all other operations have typical logarithmic cost), the algorithm runs in time

\frac{n^3}{2^{\Omega(\log n)^{1/2}}}

and is correct with high probability. On the word RAM, the algorithm runs in

n^3/2^{\Omega(\log n)^{1/2}} + n^{2+o(1)}\log M

time for edge weights in

([0,M] \cap {\mathbb Z})\cup\{\infty\}

. Prior algorithms used either

n^3/(\log^c n)

time for various

c \leq 2

, or

O(M^{\alpha}n^{\beta})

time for various

\alpha > 0

and

\beta > 2

. The new algorithm applies a tool from circuit complexity, namely the Razborov-Smolensky polynomials for approximately representing

{\sf AC}^0[p]

circuits, to efficiently reduce a matrix product over the

(\min,+)

algebra to a relatively small number of rectangular matrix products over

{\mathbb F}_2

, each of which are computable using a particularly efficient method due to Coppersmith. We also give a deterministic version of the algorithm running in

n^3/2^{\log^{\delta} n}

time for some

\delta > 0

, which utilizes the Yao-Beigel-Tarui translation of

{\sf AC}^0[m]

circuits into "nice" depth-two circuits.Comment: 24 pages. Updated version now has slightly faster running time. To appear in ACM Symposium on Theory of Computing (STOC), 201

arXiv.org e-Print Archive

Crossref

A model reduction method for biochemical reaction networks

Author: Bakker Barbara M
Jayawardhana Bayu
Rao Shodhan
van der Schaft Arjan
van Eunen Karen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: In this paper we propose a model reduction method for biochemical reaction networks governed by a variety of reversible and irreversible enzyme kinetic rate laws, including reversible Michaelis-Menten and Hill kinetics. The method proceeds by a stepwise reduction in the number of complexes, defined as the left and right-hand sides of the reactions in the network. It is based on the Kron reduction of the weighted Laplacian matrix, which describes the graph structure of the complexes and reactions in the network. It does not rely on prior knowledge of the dynamic behaviour of the network and hence can be automated, as we demonstrate. The reduced network has fewer complexes, reactions, variables and parameters as compared to the original network, and yet the behaviour of a preselected set of significant metabolites in the reduced network resembles that of the original network. Moreover the reduced network largely retains the structure and kinetics of the original model. Results: We apply our method to a yeast glycolysis model and a rat liver fatty acid beta-oxidation model. When the number of state variables in the yeast model is reduced from 12 to 7, the difference between metabolite concentrations in the reduced and the full model, averaged over time and species, is only 8%. Likewise, when the number of state variables in the rat-liver beta-oxidation model is reduced from 42 to 29, the difference between the reduced model and the full model is 7.5%. Conclusions: The method has improved our understanding of the dynamics of the two networks. We found that, contrary to the general disposition, the first few metabolites which were deleted from the network during our stepwise reduction approach, are not those with the shortest convergence times. It shows that our reduction approach performs differently from other approaches that are based on time-scale separation. The method can be used to facilitate fitting of the parameters or to embed a detailed model of interest in a more coarse-grained yet realistic environment

Proceedings - University of Groningen

University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

Ghent University Academic Bibliography

PubMed Central

Dissertations of the University of Groningen

Recommended from our members

Network and Algebraic Topology of Influenza Evolution

Author: Chan Joseph
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Evolution is a force that has molded human existence since its divergence from chimpanzees about 5.4 million years ago. In that same amount of time, an influenza virus, which replicates every six hours, would have undergone an equivalent number of generations over only a hundred years. The fast replication times of influenza, coupled with its high mutation rate, make the virus a perfect model to study real-time evolution at a mega-Darwin scale, more than a million times faster than human evolution. While recent developments in high-throughput sequencing provide an optimal opportunity to dissect their genetic evolution, a concurrent growth in computational tools is necessary to analyze the large influx of complex genomic data. In my thesis, I present novel computational methods to examine different aspects of influenza evolution. I first focus on seasonal influenza, particularly the problems that hamper public health initiatives to combat the virus. I introduce two new approaches: 1. The q2-coefficient, a method of quantifying pathogen surveillance, and 2. FluGraph, a technique that employs network topology to track the spread of seasonal influenza around the world. The second chapter of my thesis examines how mutations and reassortment combine to alter the course of influenza evolution towards pandemic formation. I highlight inherent deficiencies in the current phylogenetic paradigm for analyzing evolution and offer a novel methodology based on algebraic topology that comprehensively reconstructs both vertical and horizontal evolutionary events. I apply this method to viruses, with emphasis on influenza, but foresee broader application to cancer cells, bacteria, eukaryotes, and other taxa

Columbia University Academic Commons

Extended Abstracts presented at the 25th International Symposium on Mathematical Theory of Networks and Systems MTNS 2022 : held 12-16 September 2022 in Bayreuth, Germany

Author: make_name_string expected hash reference
Publication venue
Publication date: 01/01/2022
Field of study

EPub Bayreuth