178,168 research outputs found
Discussion of "EQUI-energy sampler" by Kou, Zhou and Wong
Novel sampling algorithms can significantly impact open questions in
computational biology, most notably the in silico protein folding problem. By
using computational methods, protein folding aims to find the three-dimensional
structure of a protein chain given the sequence of its amino acid building
blocks. The complexity of the problem strongly depends on the protein
representation and its energy function. The more detailed the model, the more
complex its corresponding energy function and the more challenge it sets for
sampling algorithms. Kou, Zhou and Wong [math.ST/0507080] have introduced a
novel sampling method, which could contribute significantly to the field of
structural prediction.Comment: Published at http://dx.doi.org/10.1214/009053606000000470 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Topic Modeling Toolbox Using Belief Propagation
Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model
for probabilistic topic modeling, which attracts worldwide interests and
touches on many important applications in text mining, computer vision and
computational biology. This paper introduces a topic modeling toolbox (TMBP)
based on the belief propagation (BP) algorithms. TMBP toolbox is implemented by
MEX C++/Matlab/Octave for either Windows 7 or Linux. Compared with existing
topic modeling packages, the novelty of this toolbox lies in the BP algorithms
for learning LDA-based topic models. The current version includes BP algorithms
for latent Dirichlet allocation (LDA), author-topic models (ATM), relational
topic models (RTM), and labeled LDA (LaLDA). This toolbox is an ongoing project
and more BP-based algorithms for various topic models will be added in the near
future. Interested users may also extend BP algorithms for learning more
complicated topic models. The source codes are freely available under the GNU
General Public Licence, Version 1.0 at https://mloss.org/software/view/399/.Comment: 4 page
The prospects of quantum computing in computational molecular biology
Quantum computers can in principle solve certain problems exponentially more
quickly than their classical counterparts. We have not yet reached the advent
of useful quantum computation, but when we do, it will affect nearly all
scientific disciplines. In this review, we examine how current quantum
algorithms could revolutionize computational biology and bioinformatics. There
are potential benefits across the entire field, from the ability to process
vast amounts of information and run machine learning algorithms far more
efficiently, to algorithms for quantum simulation that are poised to improve
computational calculations in drug discovery, to quantum algorithms for
optimization that may advance fields from protein structure prediction to
network analysis. However, these exciting prospects are susceptible to "hype",
and it is also important to recognize the caveats and challenges in this new
technology. Our aim is to introduce the promise and limitations of emerging
quantum computing technologies in the areas of computational molecular biology
and bioinformatics.Comment: 23 pages, 3 figure
The posterior-Viterbi: a new decoding algorithm for hidden Markov models
Background: Hidden Markov models (HMM) are powerful machine learning tools
successfully applied to problems of computational Molecular Biology. In a
predictive task, the HMM is endowed with a decoding algorithm in order to
assign the most probable state path, and in turn the class labeling, to an
unknown sequence. The Viterbi and the posterior decoding algorithms are the
most common. The former is very efficient when one path dominates, while the
latter, even though does not guarantee to preserve the automaton grammar, is
more effective when several concurring paths have similar probabilities. A
third good alternative is 1-best, which was shown to perform equal or better
than Viterbi. Results: In this paper we introduce the posterior-Viterbi (PV) a
new decoding which combines the posterior and Viterbi algorithms. PV is a two
step process: first the posterior probability of each state is computed and
then the best posterior allowed path through the model is evaluated by a
Viterbi algorithm.
Conclusions: We show that PV decoding performs better than other algorithms
first on toy models and then on the computational biological problem of the
prediction of the topology of beta-barrel membrane proteins.Comment: 23 pages, 3 figure
Efficient Algorithms for the Closest Pair Problem and Applications
The closest pair problem (CPP) is one of the well studied and fundamental
problems in computing. Given a set of points in a metric space, the problem is
to identify the pair of closest points. Another closely related problem is the
fixed radius nearest neighbors problem (FRNNP). Given a set of points and a
radius , the problem is, for every input point , to identify all the
other input points that are within a distance of from . A naive
deterministic algorithm can solve these problems in quadratic time. CPP as well
as FRNNP play a vital role in computational biology, computational finance,
share market analysis, weather prediction, entomology, electro cardiograph,
N-body simulations, molecular simulations, etc. As a result, any improvements
made in solving CPP and FRNNP will have immediate implications for the solution
of numerous problems in these domains. We live in an era of big data and
processing these data take large amounts of time. Speeding up data processing
algorithms is thus much more essential now than ever before. In this paper we
present algorithms for CPP and FRNNP that improve (in theory and/or practice)
the best-known algorithms reported in the literature for CPP and FRNNP. These
algorithms also improve the best-known algorithms for related applications
including time series motif mining and the two locus problem in Genome Wide
Association Studies (GWAS)
Minimum Information About a Simulation Experiment (MIASE)
Reproducibility of experiments is a basic requirement for science. Minimum Information (MI) guidelines have proved a helpful means of enabling reuse of existing work in modern biology. The Minimum Information Required in the Annotation of Models (MIRIAM) guidelines promote the exchange and reuse of biochemical computational models. However, information about a model alone is not sufficient to enable its efficient reuse in a computational setting. Advanced numerical algorithms and complex modeling workflows used in modern computational biology make reproduction of simulations difficult. It is therefore essential to define the core information necessary to perform simulations of those models. The Minimum Information About a Simulation Experiment (MIASE, Glossary in Box 1) describes the minimal set of information that must be provided to make the description of a simulation experiment available to others. It includes the list of models to use and their modifications, all the simulation procedures to apply and in which order, the processing of the raw numerical results, and the description of the final output. MIASE allows for the reproduction of any simulation experiment. The provision of this information, along with a set of required models, guarantees that the simulation experiment represents the intention of the original authors. Following MIASE guidelines will thus improve the quality of scientific reporting, and will also allow collaborative, more distributed efforts in computational modeling and simulation of biological processes
Computational Methods for Protein Identification from Mass Spectrometry Data
Protein identification using mass spectrometry is an indispensable computational tool in the life sciences. A dramatic increase in the use of proteomic strategies to understand the biology of living systems generates an ongoing need for more effective, efficient, and accurate computational methods for protein identification. A wide range of computational methods, each with various implementations, are available to complement different proteomic approaches. A solid knowledge of the range of algorithms available and, more critically, the accuracy and effectiveness of these techniques is essential to ensure as many of the proteins as possible, within any particular experiment, are correctly identified. Here, we undertake a systematic review of the currently available methods and algorithms for interpreting, managing, and analyzing biological data associated with protein identification. We summarize the advances in computational solutions as they have responded to corresponding advances in mass spectrometry hardware. The evolution of scoring algorithms and metrics for automated protein identification are also discussed with a focus on the relative performance of different techniques. We also consider the relative advantages and limitations of different techniques in particular biological contexts. Finally, we present our perspective on future developments in the area of computational protein identification by considering the most recent literature on new and promising approaches to the problem as well as identifying areas yet to be explored and the potential application of methods from other areas of computational biology
- ā¦