60,024 research outputs found

    Parametric Alignment of Drosophila Genomes

    Get PDF
    The classic algorithms of Needleman--Wunsch and Smith--Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). In order to process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces which are suitable for Needleman--Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. Parametric alignment resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters. The alignment polytopes, software, and supplementary material can be downloaded at http://bio.math.berkeley.edu/parametric/.Comment: 19 pages, 3 figure

    A control algorithm for autonomous optimization of extracellular recordings

    Get PDF
    This paper develops a control algorithm that can autonomously position an electrode so as to find and then maintain an optimal extracellular recording position. The algorithm was developed and tested in a two-neuron computational model representative of the cells found in cerebral cortex. The algorithm is based on a stochastic optimization of a suitably defined signal quality metric and is shown capable of finding the optimal recording position along representative sampling directions, as well as maintaining the optimal signal quality in the face of modeled tissue movements. The application of the algorithm to acute neurophysiological recording experiments and its potential implications to chronic recording electrode arrays are discussed

    Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

    Full text link
    Subsequence clustering of multivariate time series is a useful tool for discovering repeated patterns in temporal data. Once these patterns have been discovered, seemingly complicated datasets can be interpreted as a temporal sequence of only a small number of states, or clusters. For example, raw sensor data from a fitness-tracking application can be expressed as a timeline of a select few actions (i.e., walking, sitting, running). However, discovering these patterns is challenging because it requires simultaneous segmentation and clustering of the time series. Furthermore, interpreting the resulting clusters is difficult, especially when the data is high-dimensional. Here we propose a new method of model-based clustering, which we call Toeplitz Inverse Covariance-based Clustering (TICC). Each cluster in the TICC method is defined by a correlation network, or Markov random field (MRF), characterizing the interdependencies between different observations in a typical subsequence of that cluster. Based on this graphical representation, TICC simultaneously segments and clusters the time series data. We solve the TICC problem through alternating minimization, using a variation of the expectation maximization (EM) algorithm. We derive closed-form solutions to efficiently solve the two resulting subproblems in a scalable way, through dynamic programming and the alternating direction method of multipliers (ADMM), respectively. We validate our approach by comparing TICC to several state-of-the-art baselines in a series of synthetic experiments, and we then demonstrate on an automobile sensor dataset how TICC can be used to learn interpretable clusters in real-world scenarios.Comment: This revised version fixes two small typos in the published versio

    A Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots

    Get PDF
    This work explores a new approach in using genetic algorithm to predict RNA secondary structures with pseudoknots. Since only a small portion of most RNA structures is comprised of pseudoknots, the majority of structural elements from an optimal pseudoknot-free structure are likely to be part of the true structure. Thus seeding the genetic algorithm with optimal pseudoknot-free structures will more likely lead it to the true structure than a randomly generated population. The genetic algorithm uses the known energy models with an additional augmentation to allow complex pseudoknots. The nearest-neighbor energy model is used in conjunction with Turner’s thermodynamic parameters for pseudoknot-free structures, and the H-type pseudoknot energy estimation for simple pseudoknots. Testing with known pseudoknot sequences from PseudoBase shows that it out performs some of the current popular algorithms
    corecore