Search CORE

243 research outputs found

The EM Algorithm and the Rise of Computational Biology

Author: Citable Link
Jun S. Liu
Xiaodan Fan
Yuan Yuan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic Sequence

Author: Hon W
Lam TW
Ma CCC
Sadakane K
Wong KF
Yiu SM
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

published_or_final_versio

HKU Scholars Hub

Molecular modeling of proteins and peptides related to cell attachment in vivo and in vitro

Author: Zhao Wanhua
Publication venue: Louisiana Tech Digital Commons
Publication date: 01/07/2006
Field of study

Polypeptides constitute half of the dry mass of the cell, they form the bulk of the extracellular matrix (ECM), and they are a common element of extra- and intracellular signaling pathways. There is increasing interest in the development of computational methods in polypeptide and protein engineering on all length scales. This research concerns the development of computational methods for study of polypeptide interactions related to cell attachment in vivo and in vitro. Polypeptides are inherently biocompatible, and an astronomical range of unique sequences can be designed and realized in massive quantities by modern methods of synthesis and purification. These macromolecules therefore constitute an intriguing class of polyelectrolyte for biomedically-oriented multilayer film engineering (Haynie et al., 2005), Applications of such films include artificial cells, drug delivery systems, and implant device coatings, cell/tissue scaffolds (ECM mimics). The plasma membrane-associated cytoplasmic protein tensin is involved in cell attachment, cell migration, embryogenesis, and wound healing. The tensin polypeptide comprises several modular domains implicated in signal transduction. It has been shown that the N-terminal region of tensin is a close homolog of a tumor suppressor that is highly mutated in glioblastomas, breast cancer, and other cancers. There are two related areas of development in this work: Polypeptide multilayer films, a type of ECM mimics, and the molecular physiology of tensin. Two studies have been carried out on polypeptide multilayer films: aggregates of the model polypeptides poly(L-lysine) (PLL) and poly(L-glutamic acid) (PLGA), and interpolyelectrolytes complexes (IPECs) of designed peptides. Molecular models of all known domain of tensin have been developed by homology modeling. The binding properties of the two domain of tensin have been studied. Molecular dynamics (MD) simulations of PLL/PLGA aggregates suggest that both hydrophobic interactions and electrostatics interactions play a significant role in stabilizing polypeptide multilayer structures. The approach provides a general means to determine how non-covalent interactions contribute to the structure and stability of polypeptide multilayer films. MD simulations of designed polypeptide complexes have been carried out in vacuum and in implicit solvent. The simulation results correlate with experimental data on the same peptides. Energy minimization and MD study of tensin domain-peptide complexes has provided insight on biofunctionality of the tensin molecule and thereby its role in cell adhesion. Such knowledge will be important for determining the molecular basis of cell adhesion in health and disease and engineering treatments of abnormalities involving cell attachment

Louisiana Tech Digital Commons

A list of parameterized problems in bioinformatics

Author: Félix Ávila Liliana
García Chacón Alina
Serna Iglesias María José
Thilikos Touloupas Dimitrios
Publication venue
Publication date: 01/01/2006
Field of study

In this report we present a list of problems that originated in bionformatics. Our aim is to collect information on such problems that have been analyzed from the point of view of Parameterized Complexity. For every problem we give its definition and biological motivation together with known complexity results.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation

Author: A Krogh
A Marchler-Bauer
A Milosavljević
A Pertsemlidis
AA Schäffer
AY Mitrophanov
BJ Webb
Burkhard Rost
C Barrett
C Webber
D Drasdo
D Metzler
D Siegmund
DJC MacKay
EJ Gumbel
EP Nawrocki
ET Jaynes
I Letunic
J Park
JD Storey
JF Lawless
JS Liu
K Karplus
K Karplus
K Sjölander
M Madera
MG Kann
MQ Zhang
MS Waterman
N Chia
P Bucher
R Bundschuh
R Durbin
R Mott
R Mott
R Mott
R Olsen
RC Edgar
RD Finn
S Johnson
S Karlin
S Karlin
S Miyazawa
Sean R. Eddy
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SR Eddy
SR Eddy
TF Smith
WR Pearson
Y-K Yu
Y-K Yu
Y-K Yu
Y-K Yu
Publication venue: Public Library of Science
Publication date: 01/05/2008
Field of study

Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ = log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Exploiting bounded signal flow for graph orientation based on cause-effect pairs

Author: Dorn Britta
Hüffner Falk
Krüger Dominikus
Niedermeier Rolf
Uhlmann Johannes
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: We consider the following problem: Given an undirected network and a set of sender–receiver pairs, direct all edges such that the maximum number of “signal flows ” defined by the pairs can be routed respecting edge directions. This problem has applications in understanding protein interaction based cell regulation mechanisms. Since this problem is NP-hard, research so far concentrated on polynomial-time approximation algorithms and tractable special cases. Results: We take the viewpoint of parameterized algorithmics and examine several parameters related to the maximum signal flow over vertices or edges. We provide several fixed-parameter tractability results, and in one case a sharp complexity dichotomy between a linear-time solvable case and a slightly more general NP-hard case. We examine the value of these parameters for several real-world network instances. Conclusions: Several biologically relevant special cases of the NP-hard problem can be solved to optimality. In this way, parameterized analysis yields both deeper insight into the computational complexity and practical solving strategies. Background Current technologies [1] like two-hybrid screening ca

CiteSeerX

DepositOnce

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Supervised Detection of Conserved Motifs in DNA Sequences with cosmo

Author: Bembom Oliver
Keles Sunduz
van der Laan Mark J.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 26/07/2006
Field of study

A number of computational methods have been proposed for identifying transcription factor binding sites from a set of unaligned sequences that are thought to share the motif in question. We here introduce an algorithm, called cosmo, that allows this search to be supervised by specifying a set of constraints that the position weight matrix of the unknown motif must satisfy. Such constraints may be formulated, for example, on the basis of prior knowledge about the structure of the transcription factor in question. The algorithm is based on the same two-component multinomial mixture model used by MEME, with stronger reliance, however, on the likelihood principle instead of more ad-hoc criteria like the E-value. The intensity parameter in the ZOOPS and TCM models, for instance, is estimated based on a profile-likelihood approach, and the width of the unknown motif is selected based on BIC. These changes allow cosmo to outperform MEME even in the absence of any constraints, as evidenced by 2- to 3-fold greater sensitivity in some simulation studies. Additional improvements in performance can be achieved by selecting the model type (OOPS, ZOOPS, or TCM) data-adaptively or by supplying correctly specified constraints, especially if the motif appears only as a weak signal in the data. The algorithm can data-adaptively choose between working in a given constrained model or in the completely unconstrained model, guarding against the risk of supplying mis-specified constraints. Simulation studies suggest that this approach can offer 3 to 3.5 times greater sensitivity than MEME. The algorithm has been implemented in the form of a stand-alone C program as well as a web application that can be accessed at http://cosmoweb.berkeley.edu. An R package is available through Bioconductor (http://bioconductor.org)

Collection Of Biostatistics Research Archive

Quantum computing algorithms: getting closer to critical problems in computational biology

Author: Banterle Francesco
Cappello Valentina
D'Elia Massimo
Da Pozzo Eleonora
Marchetti Laura
Martelli Pier Luigi
Martini Claudia
Nifosì Riccardo
Trincavelli Maria Letizia
Publication venue
Publication date: 01/01/2022
Field of study

The recent biotechnological progress has allowed life scientists and physicians to access an unprecedented, massive amount of data at all levels (molecular, supramolecular, cellular and so on) of biological complexity. So far, mostly classical computational efforts have been dedicated to the simulation, prediction or de novo design of biomolecules, in order to improve the understanding of their function or to develop novel therapeutics. At a higher level of complexity, the progress of omics disciplines (genomics, transcriptomics, proteomics and metabolomics) has prompted researchers to develop informatics means to describe and annotate new biomolecules identified with a resolution down to the single cell, but also with a high-throughput speed. Machine learning approaches have been implemented to both the modelling studies and the handling of biomedical data. Quantum computing (QC) approaches hold the promise to resolve, speed up or refine the analysis of a wide range of these computational problems. Here, we review and comment on recently developed QC algorithms for biocomputing, with a particular focus on multi-scale modelling and genomic analyses. Indeed, differently from other computational approaches such as protein structure prediction, these problems have been shown to be adequately mapped onto quantum architectures, the main limit for their immediate use being the number of qubits and decoherence effects in the available quantum machines. Possible advantages over the classical counterparts are highlighted, along with a description of some hybrid classical/quantum approaches, which could be the closest to be realistically applied in biocomputation

PubMed Central

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna