50,230 research outputs found

    Regular expression constrained sequence alignment revisited

    Get PDF
    International audienceImposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, the Regular Expression Constrained Sequence Alignment Problem was introduced, which proposed an O(n^2t^4) time and O(n^2t^2) space algorithm for solving it, where n is the length of the input strings and t is the number of states in the input non-deterministic automaton. A faster O(n^2t^3) time algorithm for the same problem was subsequently proposed. In this article, we further speed up the algorithms for Regular Language Constrained Sequence Alignment by reducing their worst case time complexity bound to O(n^2t^3/log t). This is done by establishing an optimal bound on the size of Straight-Line Programs solving the maxima computation subproblem of the basic dynamic programming algorithm. We also study another solution based on a Steiner Tree computation. While it does not improve worst case, our simulations show that both approaches are efficient in practice, especially when the input automata are dense

    CSA-X: Modularized Constrained Multiple Sequence Alignment

    Get PDF
    Imposing additional constraints on multiple sequence alignment (MSA) algorithms can often produce more biologically meaningful alignments. Hence, various constrained multiple sequence alignment (CMSA) algorithms have been developed in the literature, where researchers used anchor points, regular expressions, or context-free-grammars to specify the constraints, wherein alignments produced are forced to align around segments that match the constraints. In this thesis, we propose CSA-X, a modularized program of constrained multiple sequence alignment that accepts constraints in the form of regular expressions. It uses an arbitrary underlying multiple sequence alignment program to generate alignments, and is therefore modular. The name CSA-X refers to our proposed program generally, where the letter X is substituted with the name of a (non-constrained) multiple sequence alignment algorithm which is used as underlying MSA engine in the proposed program. We compare the accuracy of our program with another constrained multiple sequence alignment program called RE-MuSiC that similarly uses regular expressions for constraints. In addition, comparisons are also made to the underlying MSA programs (without constraints). The BAliBASE 3.0 benchmark database is used to assess the performance of the proposed program CSA-X, other MSA programs, and CMSA programs considered in this study. Based on the results presented herein, CSA-X outperforms RE-MuSiC, and scores well against the underlying alignment programs. It also shows that the use of regular expression constraints, if chosen well, created from the least conserved region of the correct alignments, improves the alignment accuracy. In this study, ProbCons and T-Coffee are used as the underlying MSA programs in CSA-X, and the accuracy of the alignments are measured in terms of Q score and TC score. On average, CSA-X used with constraints identified from the least conserved regions of the correct alignments achieves results that are 17.65% more for Q score, and 23.7% more for TC score compared to RE-MuSiC. In fact, CSA-X with ProbCons (CSA-PC) achieves a higher score in over 97.9% of the cases for Q score, and over 96.4% of the cases for TC score. In addition, CSA-X with T-Coffee (CSA-TCOF) achieves a higher score in over 97.7% of the cases for Q score, and over 94.8% of the cases for TC score. Furthermore, CSA-X with regular expressions created from the least conserved regions of the correct alignments achieves higher accuracy scores compared to standalone ProbCons and T-Coffee. To measure the statistical significance of CSA-X results, the Wilcoxon rank-sum test and Wilcoxon signed-rank test are performed, and these tests show that CSA-X results for the least conserved regular expression constraint sets from the correct BAliBASE 3.0 alignments are significantly different than those from RE-MuSiC, ProbCons, and T-Coffee

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

    iLIR : a web resource for prediction of Atg8-family interacting proteins

    Get PDF
    Macroautophagy was initially considered to be a nonselective process for bulk breakdown of cytosolic material. However, recent evidence points toward a selective mode of autophagy mediated by the so-called selective autophagy receptors (SARs). SARs act by recognizing and sorting diverse cargo substrates (e.g., proteins, organelles, pathogens) to the autophagic machinery. Known SARs are characterized by a short linear sequence motif (LIR-, LRS-, or AIM-motif) responsible for the interaction between SARs and proteins of the Atg8 family. Interestingly, many LIR-containing proteins (LIRCPs) are also involved in autophagosome formation and maturation and a few of them in regulating signaling pathways. Despite recent research efforts to experimentally identify LIRCPs, only a few dozen of this class of—often unrelated—proteins have been characterized so far using tedious cell biological, biochemical, and crystallographic approaches. The availability of an ever-increasing number of complete eukaryotic genomes provides a grand challenge for characterizing novel LIRCPs throughout the eukaryotes. Along these lines, we developed iLIR, a freely available web resource, which provides in silico tools for assisting the identification of novel LIRCPs. Given an amino acid sequence as input, iLIR searches for instances of short sequences compliant with a refined sensitive regular expression pattern of the extended LIR motif (xLIR-motif) and retrieves characterized protein domains from the SMART database for the query. Additionally, iLIR scores xLIRs against a custom position-specific scoring matrix (PSSM) and identifies potentially disordered subsequences with protein interaction potential overlapping with detected xLIR-motifs. Here we demonstrate that proteins satisfying these criteria make good LIRCP candidates for further experimental verification. Domain architecture is displayed in an informative graphic, and detailed results are also available in tabular form. We anticipate that iLIR will assist with elucidating the full complement of LIRCPs in eukaryotes

    Calipso: Physics-based Image and Video Editing through CAD Model Proxies

    Get PDF
    We present Calipso, an interactive method for editing images and videos in a physically-coherent manner. Our main idea is to realize physics-based manipulations by running a full physics simulation on proxy geometries given by non-rigidly aligned CAD models. Running these simulations allows us to apply new, unseen forces to move or deform selected objects, change physical parameters such as mass or elasticity, or even add entire new objects that interact with the rest of the underlying scene. In Calipso, the user makes edits directly in 3D; these edits are processed by the simulation and then transfered to the target 2D content using shape-to-image correspondences in a photo-realistic rendering process. To align the CAD models, we introduce an efficient CAD-to-image alignment procedure that jointly minimizes for rigid and non-rigid alignment while preserving the high-level structure of the input shape. Moreover, the user can choose to exploit image flow to estimate scene motion, producing coherent physical behavior with ambient dynamics. We demonstrate Calipso's physics-based editing on a wide range of examples producing myriad physical behavior while preserving geometric and visual consistency.Comment: 11 page

    Quantitative test of the barrier nucleosome model for statistical positioning of nucleosomes up- and downstream of transcription start sites

    Get PDF
    The positions of nucleosomes in eukaryotic genomes determine which parts of the DNA sequence are readily accessible for regulatory proteins and which are not. Genome-wide maps of nucleosome positions have revealed a salient pattern around transcription start sites, involving a nucleosome-free region (NFR) flanked by a pronounced periodic pattern in the average nucleosome density. While the periodic pattern clearly reflects well-positioned nucleosomes, the positioning mechanism is less clear. A recent experimental study by Mavrich et al. argued that the pattern observed in S. cerevisiae is qualitatively consistent with a `barrier nucleosome model', in which the oscillatory pattern is created by the statistical positioning mechanism of Kornberg and Stryer. On the other hand, there is clear evidence for intrinsic sequence preferences of nucleosomes, and it is unclear to what extent these sequence preferences affect the observed pattern. To test the barrier nucleosome model, we quantitatively analyze yeast nucleosome positioning data both up- and downstream from NFRs. Our analysis is based on the Tonks model of statistical physics which quantifies the interplay between the excluded-volume interaction of nucleosomes and their positional entropy. We find that although the typical patterns on the two sides of the NFR are different, they are both quantitatively described by the same physical model, with the same parameters, but different boundary conditions. The inferred boundary conditions suggest that the first nucleosome downstream from the NFR (the +1 nucleosome) is typically directly positioned while the first nucleosome upstream is statistically positioned via a nucleosome-repelling DNA region. These boundary conditions, which can be locally encoded into the genome sequence, significantly shape the statistical distribution of nucleosomes over a range of up to ~1000 bp to each side.Comment: includes supporting materia

    Geometry Processing of Conventionally Produced Mouse Brain Slice Images

    Full text link
    Brain mapping research in most neuroanatomical laboratories relies on conventional processing techniques, which often introduce histological artifacts such as tissue tears and tissue loss. In this paper we present techniques and algorithms for automatic registration and 3D reconstruction of conventionally produced mouse brain slices in a standardized atlas space. This is achieved first by constructing a virtual 3D mouse brain model from annotated slices of Allen Reference Atlas (ARA). Virtual re-slicing of the reconstructed model generates ARA-based slice images corresponding to the microscopic images of histological brain sections. These image pairs are aligned using a geometric approach through contour images. Histological artifacts in the microscopic images are detected and removed using Constrained Delaunay Triangulation before performing global alignment. Finally, non-linear registration is performed by solving Laplace's equation with Dirichlet boundary conditions. Our methods provide significant improvements over previously reported registration techniques for the tested slices in 3D space, especially on slices with significant histological artifacts. Further, as an application we count the number of neurons in various anatomical regions using a dataset of 51 microscopic slices from a single mouse brain. This work represents a significant contribution to this subfield of neuroscience as it provides tools to neuroanatomist for analyzing and processing histological data.Comment: 14 pages, 11 figure
    corecore