243 research outputs found

    The EM Algorithm and the Rise of Computational Biology

    Get PDF
    In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic Sequence

    Get PDF
    published_or_final_versio

    Molecular modeling of proteins and peptides related to cell attachment in vivo and in vitro

    Get PDF
    Polypeptides constitute half of the dry mass of the cell, they form the bulk of the extracellular matrix (ECM), and they are a common element of extra- and intracellular signaling pathways. There is increasing interest in the development of computational methods in polypeptide and protein engineering on all length scales. This research concerns the development of computational methods for study of polypeptide interactions related to cell attachment in vivo and in vitro. Polypeptides are inherently biocompatible, and an astronomical range of unique sequences can be designed and realized in massive quantities by modern methods of synthesis and purification. These macromolecules therefore constitute an intriguing class of polyelectrolyte for biomedically-oriented multilayer film engineering (Haynie et al., 2005), Applications of such films include artificial cells, drug delivery systems, and implant device coatings, cell/tissue scaffolds (ECM mimics). The plasma membrane-associated cytoplasmic protein tensin is involved in cell attachment, cell migration, embryogenesis, and wound healing. The tensin polypeptide comprises several modular domains implicated in signal transduction. It has been shown that the N-terminal region of tensin is a close homolog of a tumor suppressor that is highly mutated in glioblastomas, breast cancer, and other cancers. There are two related areas of development in this work: Polypeptide multilayer films, a type of ECM mimics, and the molecular physiology of tensin. Two studies have been carried out on polypeptide multilayer films: aggregates of the model polypeptides poly(L-lysine) (PLL) and poly(L-glutamic acid) (PLGA), and interpolyelectrolytes complexes (IPECs) of designed peptides. Molecular models of all known domain of tensin have been developed by homology modeling. The binding properties of the two domain of tensin have been studied. Molecular dynamics (MD) simulations of PLL/PLGA aggregates suggest that both hydrophobic interactions and electrostatics interactions play a significant role in stabilizing polypeptide multilayer structures. The approach provides a general means to determine how non-covalent interactions contribute to the structure and stability of polypeptide multilayer films. MD simulations of designed polypeptide complexes have been carried out in vacuum and in implicit solvent. The simulation results correlate with experimental data on the same peptides. Energy minimization and MD study of tensin domain-peptide complexes has provided insight on biofunctionality of the tensin molecule and thereby its role in cell adhesion. Such knowledge will be important for determining the molecular basis of cell adhesion in health and disease and engineering treatments of abnormalities involving cell attachment

    A list of parameterized problems in bioinformatics

    Get PDF
    In this report we present a list of problems that originated in bionformatics. Our aim is to collect information on such problems that have been analyzed from the point of view of Parameterized Complexity. For every problem we give its definition and biological motivation together with known complexity results.Postprint (published version

    A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation

    Get PDF
    Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ = log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments

    Exploiting bounded signal flow for graph orientation based on cause-effect pairs

    Get PDF
    Background: We consider the following problem: Given an undirected network and a set of sender–receiver pairs, direct all edges such that the maximum number of “signal flows ” defined by the pairs can be routed respecting edge directions. This problem has applications in understanding protein interaction based cell regulation mechanisms. Since this problem is NP-hard, research so far concentrated on polynomial-time approximation algorithms and tractable special cases. Results: We take the viewpoint of parameterized algorithmics and examine several parameters related to the maximum signal flow over vertices or edges. We provide several fixed-parameter tractability results, and in one case a sharp complexity dichotomy between a linear-time solvable case and a slightly more general NP-hard case. We examine the value of these parameters for several real-world network instances. Conclusions: Several biologically relevant special cases of the NP-hard problem can be solved to optimality. In this way, parameterized analysis yields both deeper insight into the computational complexity and practical solving strategies. Background Current technologies [1] like two-hybrid screening ca

    Supervised Detection of Conserved Motifs in DNA Sequences with cosmo

    Get PDF
    A number of computational methods have been proposed for identifying transcription factor binding sites from a set of unaligned sequences that are thought to share the motif in question. We here introduce an algorithm, called cosmo, that allows this search to be supervised by specifying a set of constraints that the position weight matrix of the unknown motif must satisfy. Such constraints may be formulated, for example, on the basis of prior knowledge about the structure of the transcription factor in question. The algorithm is based on the same two-component multinomial mixture model used by MEME, with stronger reliance, however, on the likelihood principle instead of more ad-hoc criteria like the E-value. The intensity parameter in the ZOOPS and TCM models, for instance, is estimated based on a profile-likelihood approach, and the width of the unknown motif is selected based on BIC. These changes allow cosmo to outperform MEME even in the absence of any constraints, as evidenced by 2- to 3-fold greater sensitivity in some simulation studies. Additional improvements in performance can be achieved by selecting the model type (OOPS, ZOOPS, or TCM) data-adaptively or by supplying correctly specified constraints, especially if the motif appears only as a weak signal in the data. The algorithm can data-adaptively choose between working in a given constrained model or in the completely unconstrained model, guarding against the risk of supplying mis-specified constraints. Simulation studies suggest that this approach can offer 3 to 3.5 times greater sensitivity than MEME. The algorithm has been implemented in the form of a stand-alone C program as well as a web application that can be accessed at http://cosmoweb.berkeley.edu. An R package is available through Bioconductor (http://bioconductor.org)

    Quantum computing algorithms: getting closer to critical problems in computational biology

    Get PDF
    The recent biotechnological progress has allowed life scientists and physicians to access an unprecedented, massive amount of data at all levels (molecular, supramolecular, cellular and so on) of biological complexity. So far, mostly classical computational efforts have been dedicated to the simulation, prediction or de novo design of biomolecules, in order to improve the understanding of their function or to develop novel therapeutics. At a higher level of complexity, the progress of omics disciplines (genomics, transcriptomics, proteomics and metabolomics) has prompted researchers to develop informatics means to describe and annotate new biomolecules identified with a resolution down to the single cell, but also with a high-throughput speed. Machine learning approaches have been implemented to both the modelling studies and the handling of biomedical data. Quantum computing (QC) approaches hold the promise to resolve, speed up or refine the analysis of a wide range of these computational problems. Here, we review and comment on recently developed QC algorithms for biocomputing, with a particular focus on multi-scale modelling and genomic analyses. Indeed, differently from other computational approaches such as protein structure prediction, these problems have been shown to be adequately mapped onto quantum architectures, the main limit for their immediate use being the number of qubits and decoherence effects in the available quantum machines. Possible advantages over the classical counterparts are highlighted, along with a description of some hybrid classical/quantum approaches, which could be the closest to be realistically applied in biocomputation
    corecore