27,784 research outputs found
Bayesian model-based approaches with MCMC computation to some bioinformatics problems
Bioinformatics applications can address the transfer of information at several stages
of the central dogma of molecular biology, including transcription and translation.
This dissertation focuses on using Bayesian models to interpret biological data in
bioinformatics, using Markov chain Monte Carlo (MCMC) for the inference method.
First, we use our approach to interpret data at the transcription level. We propose
a two-level hierarchical Bayesian model for variable selection on cDNA Microarray
data. cDNA Microarray quantifies mRNA levels of a gene simultaneously so has
thousands of genes in one sample. By observing the expression patterns of genes under
various treatment conditions, important clues about gene function can be obtained.
We consider a multivariate Bayesian regression model and assign priors that favor
sparseness in terms of number of variables (genes) used. We introduce the use of
different priors to promote different degrees of sparseness using a unified two-level
hierarchical Bayesian model. Second, we apply our method to a problem related to
the translation level. We develop hidden Markov models to model linker/non-linker
sequence regions in a protein sequence. We use a linker index to exploit differences
in amino acid composition between regions from sequence information alone. A goal
of protein structure prediction is to take an amino acid sequence (represented as
a sequence of letters) and predict its tertiary structure. The identification of linker
regions in a protein sequence is valuable in predicting the three-dimensional structure.
Because of the complexities of both models encountered in practice, we employ the
Markov chain Monte Carlo method (MCMC), particularly Gibbs sampling (Gelfand
and Smith, 1990) for the inference of the parameter estimation
A hierarchical Bayesian model for inference of copy number variants and their association to gene expression
A number of statistical models have been successfully developed for the
analysis of high-throughput data from a single source, but few methods are
available for integrating data from different sources. Here we focus on
integrating gene expression levels with comparative genomic hybridization (CGH)
array measurements collected on the same subjects. We specify a measurement
error model that relates the gene expression levels to latent copy number
states which, in turn, are related to the observed surrogate CGH measurements
via a hidden Markov model. We employ selection priors that exploit the
dependencies across adjacent copy number states and investigate MCMC stochastic
search techniques for posterior inference. Our approach results in a unified
modeling framework for simultaneously inferring copy number variants (CNV) and
identifying their significant associations with mRNA transcripts abundance. We
show performance on simulated data and illustrate an application to data from a
genomic study on human cancer cell lines.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS705 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bayesian Gene Set Analysis
Gene expression microarray technologies provide the simultaneous measurements
of a large number of genes. Typical analyses of such data focus on the
individual genes, but recent work has demonstrated that evaluating changes in
expression across predefined sets of genes often increases statistical power
and produces more robust results. We introduce a new methodology for
identifying gene sets that are differentially expressed under varying
experimental conditions. Our approach uses a hierarchical Bayesian framework
where a hyperparameter measures the significance of each gene set. Using
simulated data, we compare our proposed method to alternative approaches, such
as Gene Set Enrichment Analysis (GSEA) and Gene Set Analysis (GSA). Our
approach provides the best overall performance. We also discuss the application
of our method to experimental data based on p53 mutation status
Bayesian Tobit quantile regression using-prior distribution with ridge parameter
A Bayesian approach is proposed for coefficient estimation in the Tobit quantile regression model. The
proposed approach is based on placing a g-prior distribution depends on the quantile level on the regression
coefficients. The prior is generalized by introducing a ridge parameter to address important challenges
that may arise with censored data, such as multicollinearity and overfitting problems. Then, a stochastic
search variable selection approach is proposed for Tobit quantile regression model based on g-prior. An
expression for the hyperparameter g is proposed to calibrate the modified g-prior with a ridge parameter to
the corresponding g-prior. Some possible extensions of the proposed approach are discussed, including the
continuous and binary responses in quantile regression. The methods are illustrated using several simulation
studies and a microarray study. The simulation studies and the microarray study indicate that the proposed
approach performs well
- …