148 research outputs found
Investigation of Feature Selection Methods in High-Throughput Omics Data Analysis
High-throughput technology, such as microarray and next generation sequencing has accelerated the identification of uncovered biomarkers and developing of novel diagnosis approach in precision medicine. Meanwhile, with the ability to measure tons of biomarkers simultaneously in one single experiment, collecting enough biological samples has become the bottleneck of data accumulating. Feature selection is a common strategy tackle this ‘small n and large p’ scenario. Most of current feature selection methods are purely based on statistics theories. However, based on the experiences in analyzing high-throughput data in various projects, I believe biological knowledge could play an important role in feature selection. Therefore, in this dissertation, I present computational investigations of the biological knowledge integrated feature selection methods when dealing with high-dimensional omics data.
Firstly, I present two bioinformatics practices of analyzing high-throughput data in biomedical researches including characterization of H3K27ac profile across different PM2.5 exposures, and investigation of batch stability in iPSC technology. Inspired by the experiences of biomedical research practices, I then design three biomedical knowledge integrated feature selection methods for high-dimensional omics data analysis. (1) To integrate domain knowledge, I develop SKI, in which two ranks are generated before feature selection, one is based on marginal correlation from omics data in hand, and another is external knowledge provided by domain experts, literatures or databases. By combining two ranks into a new rank, biomarkers are prescreened, and a further feature selection approach such as LASSO is performed. In a simulation study, I show SKI outperforms other methods without knowledge integration. I then apply SKI in a gene expression dataset to predict drug-response in different cell lines. A higher prediction accuracy is achieved by using SKI method than regular LASSO-based method. (2) To integrate multi-omics data, such as methylation and copy number variants, for survival data analysis, I develop two methods SKI-Cox, and wLASSO-Cox. Cox regression is a common model for survival data analysis. SKI-Cox prescreens genes based on different levels of omics data, and further selects genes in a transcriptome-based Cox regression model. wLASSO-Cox puts the marginal utilities derived from Cox-regression model on other omics-data as the penalty factors in a penalized Cox regression on mRNA expression. By simulation, I show two methods could select more true variables when analyzing omics based survival data. And Better performance is achieved in terms of overall survival time predicting in glioblastoma and lung adenocarcinoma patients using TCGA dataset. (3) To integrate pathway or gene set information, my colleagues and I develop a redundancy removable pathway (RRP) based feature selection method for binary and multi-class classification problems. Both strategies in (1) and (2) have the limitation of considering the genes (features) as the independent variables and ignoring the hidden relations among them. Our method uses a greedy algorithm to search the gene set whose distinguishing power is maximized for a specific pathway, and pathway activities inferred from the expression of selected genes, are used for a multi-class K nearest neighbor classifier. By testing our method in three sarcomas microarray datasets, we show our method is a robust feature selection method for multi-class classification.
Overall, the above studies have provided more flexible approaches with knowledge integration to select biological relevant features in analyzing high-throughput omics data. The success of applying them in the real-world datasets have demonstrated a close interaction between biologists and statistician is critical to decipher the complex biological data generated in biomedical researches
Quality Characteristics of a Pickled Tea Processed by Submerged Fermentation
<p>There is a traditional pickled tea fermented under anaerobic condition in many Asian countries, but its quality characteristics are still unclear. The dynamic qualities, chemical components, and volatile components of pickled tea processed by submerged fermentation were investigated. The results showed that the sensory qualities of the pickled tea gradually improved with increasing fermentation time. Sensory qualities decreased gradually after 7 d of fermentation, and the change was extremely significant (<i>p</i> < 0.01). Correspondingly, biochemical indicators (except for soluble extract) decreased continuously and significantly (<i>p</i> < 0.01) within 5–7 d of fermentation, but the decrease was much slower in the later stage. The proper control of fermentation time is a key step for obtaining the desired quality of pickled tea. By submerged fermentation for 7 d, the best sensory quality of pickled tea was obtained: the taste of pickled tea was less bitter and astringent, but certainly acidic; the flavor of pickled tea was sour with sweet and floral tastes; the tea had low caffeine content, high gamma-amino butyric acid content, and high contents of special volatile components, such as hotrienol and 2,6-bis (1,1-dimethylethyl)-4-methyl-phenol. Results indicated that submerged fermentation is potentially a new method of processing pickled tea to obtain the desired quality. Our results contribute to the understanding of the microbial transformation of tea components under anaerobic conditions.</p
Convenient, Rapid and Accurate Measurement of SVOC Emission Characteristics in Experimental Chambers
<div><p>Chamber tests are usually used to determine the source characteristics of semi-volatile organic compounds (SVOCs) which are critical to quantify indoor exposure to SVOCs. In contrast to volatile organic compounds (VOCs), the sorption effect of SVOCs to chamber surfaces usually needs to be considered due to the much higher surface/air partition coefficients, resulting in a long time to reach steady state, frequently on the order of months, and complicating the mathematical analysis of the resulting data. A chamber test is also complicated if the material-phase concentration is not constant. This study shows how to design a chamber to overcome these limitations. A dimensionless mass transfer analysis is used to specify conditions for (1) neglecting the SVOC sorption effect to chamber surfaces, (2) neglecting the convective mass transfer resistance at sorption surfaces if the sorption effect cannot be neglected, and (3) regarding the material-phase concentration in the source as constant. Several practical and quantifiable ways to improve chamber design are proposed. The approach is illustrated by analyzing available data from three different chambers in terms of the accuracy with which the model parameters can be determined and the time needed to conduct the chamber test. The results should greatly facilitate the design of chambers to characterize SVOC emissions and the resulting exposure.</p></div
C–O Bond Cleavage of Dimethyl Ether by Transition Metal Ions: A Systematic Study on Catalytic Properties of Metals and Performance of DFT Functionals
Studies were focused on late 3d and
4d transition metal ion (Fe,
Co, Ni, Cu, Ru, Rh, Pd, and Ag) mediated activation of dimethyl ether,
to investigate the intrinsic catalytic properties of metals on C–O
bond cleavage. A set of density functional (DFT) methods (BLYP, B3LYP,
M06, M06-L, B97-1, B97-D, TPSS, and PBE0) with aug-cc-pVTZ were utilized,
and the results were calibrated with CCSDÂ(T)/CBS. The utility of CCSDÂ(T)/CBS
calculations for these systems was validated by MRCI/aug-cc-pVTZ calculations.
Calculations showed an interesting energetic trend as a function of
metal; earlier transition metals tend to give smaller reaction barriers
and more exergonic reactions than later metals. This applies to both
3d and 4d systems. For the performance of DFT functionals, PBE0 gave
the lowest root mean squared deviations (RMSDs) in terms of both reaction
energies and barriers for both 3d and 4d systems, compared to the
other functionals. Our studies found that the percentage of Hartree–Fock
(HF) exchange plays an important role in the accuracy of DFT methods
for these systems, and 26% HF exchange for 3d systems and 34% HF exchange
for 4d systems gave the lowest RMSDs
Schematic representation of SVOC source/sink behavior in a chamber.
<p>Schematic representation of SVOC source/sink behavior in a chamber.</p
The influence of the sink effect on gas-phase SVOC concentration.
<p>(a) the mass transfer strength; (b) the sorption strength.</p
Results of a two-node DCM analysis applied to the flashing checkerboard experiment.
<p>The coupling parameters calculated with actual are shown alongside the corresponding connections. The values in brackets are parameters estimated with assumed . in visual area V, in V and assumed in two areas. and represent external inputs into the system; and are the hemodynamic observations and arrows indicate connections.</p
- …