306 research outputs found
DEVELOPMENT AND IMPLEMENTATION OF A BIOINFORMATICS ONLINE DISTANCE EDUCATION LEARNING TOOL FOR AFRICA
Bioinformatics refers to the creation and advancement of algorithms, computational and statistical techniques and theories for solving formal and practical problems arising from the management and analysis of biological data. However, some parts of the African continent have not been
properly sensitized to bio-scientific and computing field. Thus, there is the need for appropriate strategies of introducing the basic components of this emerging scientific field to part of the African populace through the development of an online distance education learning tool. This study involved the design of a bioinformatics online distance educative tool an implementation of the
bioinformatics online distance educative tool by a programming approach. Design and implementation were done using the Borland Delphi 7 Enterprise edition within its Integrated Development Environment. The advantage of using Delphi programming language in implementing this useful bioinformatics web tool is that Delphi programming language is an object oriented programming language that has a lot of extra facilities for the enhancement of further technical
functions, which ordinary HTML cannot handle. The development and use of a bioinformatics distance education software, as a teaching tool, in some African countries holds great promise for accommodating the needs of the populace, who live in cities, small towns and remote areas
Bayesian regularization of hidden Markov models with an application to bioinformatics
This paper discusses a Bayesian approach to regularizing hidden Markov models and demonstrates an application of this scheme to Bioinformatics
Online Learning in Discrete Hidden Markov Models
We present and analyse three online algorithms for learning in discrete
Hidden Markov Models (HMMs) and compare them with the Baldi-Chauvin Algorithm.
Using the Kullback-Leibler divergence as a measure of generalisation error we
draw learning curves in simplified situations. The performance for learning
drifting concepts of one of the presented algorithms is analysed and compared
with the Baldi-Chauvin algorithm in the same situations. A brief discussion
about learning and symmetry breaking based on our results is also presented.Comment: 8 pages, 6 figure
An empirical comparison of supervised machine learning techniques in bioinformatics
Research in bioinformatics is driven by the experimental data.
Current biological databases are populated by vast amounts of
experimental data. Machine learning has been widely applied to
bioinformatics and has gained a lot of success in this research
area. At present, with various learning algorithms available in the
literature, researchers are facing difficulties in choosing the best
method that can apply to their data. We performed an empirical
study on 7 individual learning systems and 9 different combined
methods on 4 different biological data sets, and provide some
suggested issues to be considered when answering the following
questions: (i) How does one choose which algorithm is best
suitable for their data set? (ii) Are combined methods better than
a single approach? (iii) How does one compare the effectiveness
of a particular algorithm to the others
How Random is a Coin Toss? Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos
Symbolic dynamics has proven to be an invaluable tool in analyzing the
mechanisms that lead to unpredictability and random behavior in nonlinear
dynamical systems. Surprisingly, a discrete partition of continuous state space
can produce a coarse-grained description of the behavior that accurately
describes the invariant properties of an underlying chaotic attractor. In
particular, measures of the rate of information production--the topological and
metric entropy rates--can be estimated from the outputs of Markov or generating
partitions. Here we develop Bayesian inference for k-th order Markov chains as
a method to finding generating partitions and estimating entropy rates from
finite samples of discretized data produced by coarse-grained dynamical
systems.Comment: 8 pages, 1 figure; http://cse.ucdavis.edu/~cmg/compmech/pubs/hrct.ht
The posterior-Viterbi: a new decoding algorithm for hidden Markov models
Background: Hidden Markov models (HMM) are powerful machine learning tools
successfully applied to problems of computational Molecular Biology. In a
predictive task, the HMM is endowed with a decoding algorithm in order to
assign the most probable state path, and in turn the class labeling, to an
unknown sequence. The Viterbi and the posterior decoding algorithms are the
most common. The former is very efficient when one path dominates, while the
latter, even though does not guarantee to preserve the automaton grammar, is
more effective when several concurring paths have similar probabilities. A
third good alternative is 1-best, which was shown to perform equal or better
than Viterbi. Results: In this paper we introduce the posterior-Viterbi (PV) a
new decoding which combines the posterior and Viterbi algorithms. PV is a two
step process: first the posterior probability of each state is computed and
then the best posterior allowed path through the model is evaluated by a
Viterbi algorithm.
Conclusions: We show that PV decoding performs better than other algorithms
first on toy models and then on the computational biological problem of the
prediction of the topology of beta-barrel membrane proteins.Comment: 23 pages, 3 figure
DNA Steganalysis Using Deep Recurrent Neural Networks
Recent advances in next-generation sequencing technologies have facilitated
the use of deoxyribonucleic acid (DNA) as a novel covert channels in
steganography. There are various methods that exist in other domains to detect
hidden messages in conventional covert channels. However, they have not been
applied to DNA steganography. The current most common detection approaches,
namely frequency analysis-based methods, often overlook important signals when
directly applied to DNA steganography because those methods depend on the
distribution of the number of sequence characters. To address this limitation,
we propose a general sequence learning-based DNA steganalysis framework. The
proposed approach learns the intrinsic distribution of coding and non-coding
sequences and detects hidden messages by exploiting distribution variations
after hiding these messages. Using deep recurrent neural networks (RNNs), our
framework identifies the distribution variations by using the classification
score to predict whether a sequence is to be a coding or non-coding sequence.
We compare our proposed method to various existing methods and biological
sequence analysis methods implemented on top of our framework. According to our
experimental results, our approach delivers a robust detection performance
compared to other tools
Prediction of peptides binding to MHC class I alleles by partial periodic pattern mining
MHC (Major Histocompatibility Complex) is a key player in the immune response of an organism. It is important to be able to predict which antigenic peptides will bind to a spe-cific MHC allele and which will not, creating possibilities for controlling immune response and for the applications of immunotherapy. However a problem encountered in the computational binding prediction methods for MHC class I is the presence of bulges and loops in the peptides, changing the total length. Most machine learning methods in use to-day require the sequences to be of same length to success-fully mine the binding motifs. We propose the use of time-based data mining methods in motif mining to be able to mine motifs position-independently. Also, the information for both binding and non-binding peptides are used on the contrary to the other methods which only rely on binding peptides. The prediction results are between 70-80% for the tested alleles
- …