Bioinformatics applications can address the transfer of information at several stages
of the central dogma of molecular biology, including transcription and translation.
This dissertation focuses on using Bayesian models to interpret biological data in
bioinformatics, using Markov chain Monte Carlo (MCMC) for the inference method.
First, we use our approach to interpret data at the transcription level. We propose
a two-level hierarchical Bayesian model for variable selection on cDNA Microarray
data. cDNA Microarray quantifies mRNA levels of a gene simultaneously so has
thousands of genes in one sample. By observing the expression patterns of genes under
various treatment conditions, important clues about gene function can be obtained.
We consider a multivariate Bayesian regression model and assign priors that favor
sparseness in terms of number of variables (genes) used. We introduce the use of
different priors to promote different degrees of sparseness using a unified two-level
hierarchical Bayesian model. Second, we apply our method to a problem related to
the translation level. We develop hidden Markov models to model linker/non-linker
sequence regions in a protein sequence. We use a linker index to exploit differences
in amino acid composition between regions from sequence information alone. A goal
of protein structure prediction is to take an amino acid sequence (represented as
a sequence of letters) and predict its tertiary structure. The identification of linker
regions in a protein sequence is valuable in predicting the three-dimensional structure.
Because of the complexities of both models encountered in practice, we employ the
Markov chain Monte Carlo method (MCMC), particularly Gibbs sampling (Gelfand
and Smith, 1990) for the inference of the parameter estimation