50 research outputs found
A novel algebraic approach to time-reversible evolutionary models
In the last years algebraic tools have been proven to be useful in
phylogenetic reconstruction and model selection by means of the study of
phylogenetic invariants. However, up to now, the models studied from an
algebraic viewpoint are either too general or too restrictive (as group-based
models with a uniform stationary distribution) to be used in practice.
In this paper we provide a new framework to work with time-reversible models,
which are the most widely used by biologists. In our approach we consider
algebraic time-reversible models on phylogenetic trees (as defined by Allman
and Rhodes) and introduce a new inner product to make all transition matrices
of the process diagonalizable through the same orthogonal eigenbasis. This
framework generalizes the Fourier transform widely used to work with
group-based models and recovers some of the well known results. As
illustration, we exploit the combination of our technique with algebraic
geometry tools to provide relevant phylogenetic invariants for trees evolving
under the Tamura-Nei model of nucleotide substitution
Computing Implicitizations of Multi-Graded Polynomial Maps
In this paper, we focus on computing the kernel of a map of polynomial rings
. This core problem in symbolic computation is known as
implicitization. While there are extremely effective Gr\"obner basis methods
used to solve this problem, these methods can become infeasible as the number
of variables increases. In the case when the map is multigraded, we
consider an alternative approach. We demonstrate how to quickly compute a
matrix of maximal rank for which has a positive multigrading. Then in
each graded component we compute the minimal generators of the kernel in that
multidegree with linear algebra. We have implemented our techniques in
Macaulay2 and show that our implementation can compute many generators of low
degree in examples where Gr\"obner techniques have failed. This includes
several examples coming from phylogenetics where even a complete list of
quadrics and cubics were unknown. When the multigrading refines total degree,
our algorithm is \emph{embarassingly parallel} and a fully parallelized version
of our algorithm will be forthcoming in OSCAR.Comment: 16 pages, 2 figures. An implementation of our main algorithm can be
found on our MathRepo page as well as our GitHu
New Directions for Contact Integrators
Contact integrators are a family of geometric numerical schemes which
guarantee the conservation of the contact structure. In this work we review the
construction of both the variational and Hamiltonian versions of these methods.
We illustrate some of the advantages of geometric integration in the
dissipative setting by focusing on models inspired by recent studies in
celestial mechanics and cosmology.Comment: To appear as Chapter 24 in GSI 2021, Springer LNCS 1282
Recommended from our members
Biology in the information age : computational methods to understand and engineer the central dogma
The rise of NGS, big data, and ‘-omics’ has ushered biology into a new age, with the power to fundamentally change how research is approached. Rather than using a singular hypothesis, we can now incorporate more data-driven methods that drive new biological insights, explain emergent biological phenomena, and/or derive novel functionality. This thesis highlights the changing role of computation to both learn more about biological systems as well as leveraging data-intensive computational techniques to create new proteins and enzymes.
The ability for computational approaches to drive biological understanding is presented in three studies. First, the laboratory evolution of DNA polymerases, the workhorses of replication, towards novel functionality is explored. In the three polymerases created, modeling and large scale approaches are used to demonstrate the additional capability of each new enzyme. Next, two independent studies in the genomic adaptations needed for E. coli cells to adapt a 21st amino acid (selenocysteine and nitrotyrosine) are presented. Next generation sequencing is used to better understand the mechanisms of how cells accommodate the increased fitness burden placed by an orthogonal translation system. Lastly, community-wide changes in the oral microbiome are studied in the progression towards periodontitis, with implications towards potential therapeutic targets.
The capstone of this thesis leverages big data techniques to engineer novel proteins, the chief functional units within cells. Protein structural data is implemented into a convolutional neural network to associate amino acids with neighboring chemical microenvironments at state-of-the-art accuracy. This algorithm enables identification of gain-of-function mutations, and subsequent experiments confirm substantive improvements in stability-associated phenotypes in vivo across three diverse proteins. This work is the first demonstration of using deep learning to empirically improve protein function and opens a new avenue for protein engineering.Cellular and Molecular Biolog
Generalized averaged Gaussian quadrature and applications
A simple numerical method for constructing the optimal generalized averaged Gaussian quadrature formulas will be presented. These formulas exist in many cases in which real positive GaussKronrod formulas do not exist, and can be used as an adequate alternative in order to estimate the error of a Gaussian rule. We also investigate the conditions under which the optimal averaged Gaussian quadrature formulas and their truncated variants are internal
Algebraic Statistics in Practice: Applications to Networks
Algebraic statistics uses tools from algebra (especially from multilinear
algebra, commutative algebra and computational algebra), geometry and
combinatorics to provide insight into knotty problems in mathematical
statistics. In this survey we illustrate this on three problems related to
networks, namely network models for relational data, causal structure discovery
and phylogenetics. For each problem we give an overview of recent results in
algebraic statistics with emphasis on the statistical achievements made
possible by these tools and their practical relevance for applications to other
scientific disciplines
MS FT-2-2 7 Orthogonal polynomials and quadrature: Theory, computation, and applications
Quadrature rules find many applications in science and engineering. Their analysis is a classical area of applied mathematics and continues to attract considerable attention. This seminar brings together speakers with expertise in a large variety of quadrature rules. It is the aim of the seminar to provide an overview of recent developments in the analysis of quadrature rules. The computation of error estimates and novel applications also are described
Recommended from our members
Computational Toxinology
Venoms are complex mixtures of biological macromolecules and other compounds that are used for predatory and defensive purposes by hundreds of thousands of known species worldwide. Throughout human history, venoms and venom components have been used to treat a vast array of illnesses, causing them to be of great clinical, economic, and academic interest to the drug discovery and toxinology communities. In spite of major computational advances that facilitate data-driven drug discovery, most therapeutic venom effects are still discovered via tedious trial-and-error, or simply by accident. In this dissertation, I describe a body of work that aims to establish a new subdiscipline of translational bioinformatics, which I name “computational toxinology”.
To accomplish this goal, I present three integrated components that span a wide range of informatics techniques: (1) VenomKB, (2) VenomSeq, and (3) VenomKB’s Semantic API. To provide a platform for structuring, representing, retrieving, and integrating venom data relevant to drug discovery, VenomKB provides a database-backed web application and knowledge base for computational toxinology. VenomKB is structured according to a fully-featured ontology of venoms, and provides data aggregated from many popular web re- sources. VenomSeq is a biotechnology workflow that is designed to generate new high-throughput sequencing data for incorporation into VenomKB. Specifically, we expose human cells to controlled doses of crude venoms, conduct RNA-Sequencing, and build profiles of differential gene expression, which we then compare to publicly-available differential expression data for known dis- eases and drugs with known effects, and use those comparisons to hypothesize ways that the venoms could act in a therapeutic manner, as well. These data are then integrated into VenomKB, where they can be effectively retrieved and evaluated using existing data and known therapeutic associations. VenomKB’s Semantic API further develops this functionality by providing an intelligent, powerful, and user-friendly interface for querying the complex underlying data in VenomKB in a way that reflects the intuitive, human-understandable mean- ing of those data. The Semantic API is designed to cater to the needs of advanced users as well as laypersons and bench scientists without previous expertise in computational biology and semantic data analysis.
In each chapter of the dissertation, I describe how we evaluated these 3 components through various approaches. We demonstrate the utility of VenomKB and the Semantic API by testing a number of practical use-cases for each, designed to highlight their ability to rediscover existing knowledge as well as suggesting potential areas for future exploration. We use statistics and data science techniques to evaluate VenomSeq on 25 diverse species of venomous animals, and propose biologically feasible explanations for significant findings. In evaluating the Semantic API, I show how observations on VenomSeq data can be interpreted and placed into the context of past research by members of the larger toxinology community.
Computational toxinology is a toolbox designed to be used by multiple stakeholders (toxinologists, computational biologists, and systems pharmacologists, among others) to improve the return rate of clinically-significant findings from manual experimentation. It aims to achieve this goal by enabling access to data, providing means for easy validation of results, and suggesting specific hypotheses that are preliminarily supported by rigorous inferential statistics. All components of the research I describe are open-access and publicly available, to improve reproducibility and encourage widespread adoptio