60,024 research outputs found
Parametric Alignment of Drosophila Genomes
The classic algorithms of Needleman--Wunsch and Smith--Waterman find a
maximum a posteriori probability alignment for a pair hidden Markov model
(PHMM). In order to process large genomes that have undergone complex genome
rearrangements, almost all existing whole genome alignment methods apply fast
heuristics to divide genomes into small pieces which are suitable for
Needleman--Wunsch alignment. In these alignment methods, it is standard
practice to fix the parameters and to produce a single alignment for subsequent
analysis by biologists.
Our main result is the construction of a whole genome parametric alignment of
Drosophila melanogaster and Drosophila pseudoobscura. Parametric alignment
resolves the issue of robustness to changes in parameters by finding all
optimal alignments for all possible parameters in a PHMM. Our alignment draws
on existing heuristics for dividing whole genomes into small pieces for
alignment, and it relies on advances we have made in computing convex polytopes
that allow us to parametrically align non-coding regions using biologically
realistic models. We demonstrate the utility of our parametric alignment for
biological inference by showing that cis-regulatory elements are more conserved
between Drosophila melanogaster and Drosophila pseudoobscura than previously
thought. We also show how whole genome parametric alignment can be used to
quantitatively assess the dependence of branch length estimates on alignment
parameters.
The alignment polytopes, software, and supplementary material can be
downloaded at http://bio.math.berkeley.edu/parametric/.Comment: 19 pages, 3 figure
Recommended from our members
OMMA enables population-scale analysis of complex genomic features and phylogenomic relationships from nanochannel-based optical maps.
BackgroundOptical mapping is an emerging technology that complements sequencing-based methods in genome analysis. It is widely used in improving genome assemblies and detecting structural variations by providing information over much longer (up to 1 Mb) reads. Current standards in optical mapping analysis involve assembling optical maps into contigs and aligning them to a reference, which is limited to pairwise comparison and becomes bias-prone when analyzing multiple samples.FindingsWe present a new method, OMMA, that extends optical mapping to the study of complex genomic features by simultaneously interrogating optical maps across many samples in a reference-independent manner. OMMA captures and characterizes complex genomic features, e.g., multiple haplotypes, copy number variations, and subtelomeric structures when applied to 154 human samples across the 26 populations sequenced in the 1000 Genomes Project. For small genomes such as pathogenic bacteria, OMMA accurately reconstructs the phylogenomic relationships and identifies functional elements across 21 Acinetobacter baumannii strains.ConclusionsWith the increasing data throughput of optical mapping system, the use of this technology in comparative genome analysis across many samples will become feasible. OMMA is a timely solution that can address such computational need. The OMMA software is available at https://github.com/TF-Chan-Lab/OMTools
A control algorithm for autonomous optimization of extracellular recordings
This paper develops a control algorithm that can autonomously position an electrode so as to find and then maintain an optimal extracellular recording position. The algorithm was developed and tested in a two-neuron computational model representative of the cells found in cerebral cortex. The algorithm is based on a stochastic optimization of a suitably defined signal quality metric and is shown capable of finding the optimal recording position along representative sampling directions, as well as maintaining the optimal signal quality in the face of modeled tissue movements. The application of the algorithm to acute neurophysiological recording experiments and its potential implications to chronic recording electrode arrays are discussed
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Subsequence clustering of multivariate time series is a useful tool for
discovering repeated patterns in temporal data. Once these patterns have been
discovered, seemingly complicated datasets can be interpreted as a temporal
sequence of only a small number of states, or clusters. For example, raw sensor
data from a fitness-tracking application can be expressed as a timeline of a
select few actions (i.e., walking, sitting, running). However, discovering
these patterns is challenging because it requires simultaneous segmentation and
clustering of the time series. Furthermore, interpreting the resulting clusters
is difficult, especially when the data is high-dimensional. Here we propose a
new method of model-based clustering, which we call Toeplitz Inverse
Covariance-based Clustering (TICC). Each cluster in the TICC method is defined
by a correlation network, or Markov random field (MRF), characterizing the
interdependencies between different observations in a typical subsequence of
that cluster. Based on this graphical representation, TICC simultaneously
segments and clusters the time series data. We solve the TICC problem through
alternating minimization, using a variation of the expectation maximization
(EM) algorithm. We derive closed-form solutions to efficiently solve the two
resulting subproblems in a scalable way, through dynamic programming and the
alternating direction method of multipliers (ADMM), respectively. We validate
our approach by comparing TICC to several state-of-the-art baselines in a
series of synthetic experiments, and we then demonstrate on an automobile
sensor dataset how TICC can be used to learn interpretable clusters in
real-world scenarios.Comment: This revised version fixes two small typos in the published versio
A Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots
This work explores a new approach in using genetic algorithm to predict RNA secondary structures with pseudoknots. Since only a small portion of most RNA structures is comprised of pseudoknots, the majority of structural elements from an optimal pseudoknot-free structure are likely to be part of the true structure. Thus seeding the genetic algorithm with optimal pseudoknot-free structures will more likely lead it to the true structure than a randomly generated population. The genetic algorithm uses the known energy models with an additional augmentation to allow complex pseudoknots. The nearest-neighbor energy model is used in conjunction with Turner’s thermodynamic parameters for pseudoknot-free structures, and the H-type pseudoknot energy estimation for simple pseudoknots. Testing with known pseudoknot sequences from PseudoBase shows that it out performs some of the current popular algorithms
- …