284 research outputs found
Integrative Bioinformatics Approaches toward Systems-level Understanding of Breast Cancers
Genome-wide expression profiling technologies, such as microarray expression and next-generation sequencing, have allowed unprecedented opportunities to study complex diseases at systems-level. However, the ever increasing amounts of high-throughput genomic data are extremely heterogeneous and each individual experiment provides a different aspect of the phenotype of interest. Meanwhile, numerous bioinformatics tools have been developed for genomic data analysis. Current methods for integration of data and analysis tools are scattered. Developing a systematic approach for the integration of various experiments would greatly benefit the research community.
In this thesis, we present a variety of integrative data-driven bioinformatics strategies to facilitate the data analysis and derive biological knowledge from gene expression data of breast cancer. First, we show that the use of meta-analysis on multiple microarray profiles of estrogen receptor positive and estrogen receptor negative breast cancers reveals important biological functions not found from the individual analysis. By applying a network analysis, we identify the change of gene expression between Luminal A and Luminal B breast cancer subtypes and genes representing the change. Next, we demonstrate a bioinformatics strategy to detect genes that play important roles in endocrine resistance in estrogen receptor positive breast cancers. By combining the analyses of differentially expressed genes, enriched gene set, co-expressed genes, the expression of drug-treated cancer cell lines, and clinical information, we demonstrate how our proposed strategy identifies the key genes in tamoxifen-resistant tumors and the potential new therapeutics against the resistance. Lastly, by using matched mRNA and microRNA expression we develop an integrative approach for the prediction of important transcription factors and microRNAs that are involved in dysregulated pathways in breast cancer. Our method employs random forests and robust rank aggregation to derive a reliable importance ranking for candidate regulators predicted by other bioinformatics tools. In conclusion, this thesis demonstrates that the proposed integrative bioinformatics strategies can efficiently combine heterogeneous genomic data and provide new insights on breast cancers
Automated Force Field Parameterization for Nonpolarizable and Polarizable Atomic Models Based on Ab Initio Target Data
Classical
molecular dynamics (MD) simulations based on atomistic
models are increasingly used to study a wide range of biological systems.
A prerequisite for meaningful results from such simulations is an
accurate molecular mechanical force field. Most biomolecular simulations
are currently based on the widely used AMBER and CHARMM force fields,
which were parametrized and optimized to cover a small set of basic
compounds corresponding to the natural amino acids and nucleic acid
bases. Atomic models of additional compounds are commonly generated
by analogy to the parameter set of a given force field. While this
procedure yields models that are internally consistent, the accuracy
of the resulting models can be limited. In this work, we propose a
method, general automated atomic model parameterization (GAAMP), for
generating automatically the parameters of atomic models of small
molecules using the results from ab initio quantum mechanical (QM)
calculations as target data. Force fields that were previously developed
for a wide range of model compounds serve as initial guesses, although
any of the final parameter can be optimized. The electrostatic parameters
(partial charges, polarizabilities, and shielding) are optimized on
the basis of QM electrostatic potential (ESP) and, if applicable,
the interaction energies between the compound and water molecules.
The soft dihedrals are automatically identified and parametrized by
targeting QM dihedral scans as well as the energies of stable conformers.
To validate the approach, the solvation free energy is calculated
for more than 200 small molecules and MD simulations of three different
proteins are carried out
Extra comparison between ncECE-SVD(DANEOsf) and ncMCE-SVD(SP) on interaction prediction for human.
<p>Extra comparison between ncECE-SVD(DANEOsf) and ncMCE-SVD(SP) on interaction prediction for human.</p
Smooth Scalar-on-Image Regression via Spatial Bayesian Variable Selection
<div><p>We develop scalar-on-image regression models when images are registered multidimensional manifolds. We propose a fast and scalable Bayes’ inferential procedure to estimate the image coefficient. The central idea is the combination of an Ising prior distribution, which controls a latent binary indicator map, and an intrinsic Gaussian Markov random field, which controls the smoothness of the nonzero coefficients. The model is fit using a single-site Gibbs sampler, which allows fitting within minutes for hundreds of subjects with predictor images containing thousands of locations. The code is simple and is provided in the online Appendix (see the “Supplementary Materials” section). We apply this method to a neuroimaging study where cognitive outcomes are regressed on measures of white-matter microstructure at every voxel of the corpus callosum for hundreds of subjects.</p></div
Box plot of ROC scores for human PPI prediction based on EDE-MDS(DANEOsf), MCE-MDS(SP) [12] and Kuchaiev [16].
<p>Box plot of ROC scores for human PPI prediction based on EDE-MDS(DANEOsf), MCE-MDS(SP) [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0183495#pone.0183495.ref012" target="_blank">12</a>] and Kuchaiev [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0183495#pone.0183495.ref016" target="_blank">16</a>].</p
ROC curves of different embedding dimension.
<p>ROC curves of different embedding dimension.</p
An example to show the principle of our algorithm.
<p>(a) Ground-truth network; (b) Maximum connected component; (c) Minimum spanning tree (training set); (d) Evolved network (dash lines are elementary predictions based on evolutionary analysis); (e) Distance matrix based on shortest path in MST; (f) Distance matrix based on evolutionary distance; (g) Coordinates in geometric space; (h) Euclidean distances in geometric space and corresponding confidence scores.</p
P values of paired-sample t-Test for ROC score vectors.
<p>P values of paired-sample t-Test for ROC score vectors.</p
Precision-Recall curves of interaction prediction for yeast.
<p>Precision-Recall curves of interaction prediction for yeast.</p
- …