497 research outputs found

    Investigation of Feature Selection Methods in High-Throughput Omics Data Analysis

    No full text
    High-throughput technology, such as microarray and next generation sequencing has accelerated the identification of uncovered biomarkers and developing of novel diagnosis approach in precision medicine. Meanwhile, with the ability to measure tons of biomarkers simultaneously in one single experiment, collecting enough biological samples has become the bottleneck of data accumulating. Feature selection is a common strategy tackle this ‘small n and large p’ scenario. Most of current feature selection methods are purely based on statistics theories. However, based on the experiences in analyzing high-throughput data in various projects, I believe biological knowledge could play an important role in feature selection. Therefore, in this dissertation, I present computational investigations of the biological knowledge integrated feature selection methods when dealing with high-dimensional omics data. Firstly, I present two bioinformatics practices of analyzing high-throughput data in biomedical researches including characterization of H3K27ac profile across different PM2.5 exposures, and investigation of batch stability in iPSC technology. Inspired by the experiences of biomedical research practices, I then design three biomedical knowledge integrated feature selection methods for high-dimensional omics data analysis. (1) To integrate domain knowledge, I develop SKI, in which two ranks are generated before feature selection, one is based on marginal correlation from omics data in hand, and another is external knowledge provided by domain experts, literatures or databases. By combining two ranks into a new rank, biomarkers are prescreened, and a further feature selection approach such as LASSO is performed. In a simulation study, I show SKI outperforms other methods without knowledge integration. I then apply SKI in a gene expression dataset to predict drug-response in different cell lines. A higher prediction accuracy is achieved by using SKI method than regular LASSO-based method. (2) To integrate multi-omics data, such as methylation and copy number variants, for survival data analysis, I develop two methods SKI-Cox, and wLASSO-Cox. Cox regression is a common model for survival data analysis. SKI-Cox prescreens genes based on different levels of omics data, and further selects genes in a transcriptome-based Cox regression model. wLASSO-Cox puts the marginal utilities derived from Cox-regression model on other omics-data as the penalty factors in a penalized Cox regression on mRNA expression. By simulation, I show two methods could select more true variables when analyzing omics based survival data. And Better performance is achieved in terms of overall survival time predicting in glioblastoma and lung adenocarcinoma patients using TCGA dataset. (3) To integrate pathway or gene set information, my colleagues and I develop a redundancy removable pathway (RRP) based feature selection method for binary and multi-class classification problems. Both strategies in (1) and (2) have the limitation of considering the genes (features) as the independent variables and ignoring the hidden relations among them. Our method uses a greedy algorithm to search the gene set whose distinguishing power is maximized for a specific pathway, and pathway activities inferred from the expression of selected genes, are used for a multi-class K nearest neighbor classifier. By testing our method in three sarcomas microarray datasets, we show our method is a robust feature selection method for multi-class classification. Overall, the above studies have provided more flexible approaches with knowledge integration to select biological relevant features in analyzing high-throughput omics data. The success of applying them in the real-world datasets have demonstrated a close interaction between biologists and statistician is critical to decipher the complex biological data generated in biomedical researches

    Investigation of Feature Selection Methods in High-Throughput Omics Data Analysis

    No full text
    High-throughput technology, such as microarray and next generation sequencing has accelerated the identification of uncovered biomarkers and developing of novel diagnosis approach in precision medicine. Meanwhile, with the ability to measure tons of biomarkers simultaneously in one single experiment, collecting enough biological samples has become the bottleneck of data accumulating. Feature selection is a common strategy tackle this ‘small n and large p’ scenario. Most of current feature selection methods are purely based on statistics theories. However, based on the experiences in analyzing high-throughput data in various projects, I believe biological knowledge could play an important role in feature selection. Therefore, in this dissertation, I present computational investigations of the biological knowledge integrated feature selection methods when dealing with high-dimensional omics data. Firstly, I present two bioinformatics practices of analyzing high-throughput data in biomedical researches including characterization of H3K27ac profile across different PM2.5 exposures, and investigation of batch stability in iPSC technology. Inspired by the experiences of biomedical research practices, I then design three biomedical knowledge integrated feature selection methods for high-dimensional omics data analysis. (1) To integrate domain knowledge, I develop SKI, in which two ranks are generated before feature selection, one is based on marginal correlation from omics data in hand, and another is external knowledge provided by domain experts, literatures or databases. By combining two ranks into a new rank, biomarkers are prescreened, and a further feature selection approach such as LASSO is performed. In a simulation study, I show SKI outperforms other methods without knowledge integration. I then apply SKI in a gene expression dataset to predict drug-response in different cell lines. A higher prediction accuracy is achieved by using SKI method than regular LASSO-based method. (2) To integrate multi-omics data, such as methylation and copy number variants, for survival data analysis, I develop two methods SKI-Cox, and wLASSO-Cox. Cox regression is a common model for survival data analysis. SKI-Cox prescreens genes based on different levels of omics data, and further selects genes in a transcriptome-based Cox regression model. wLASSO-Cox puts the marginal utilities derived from Cox-regression model on other omics-data as the penalty factors in a penalized Cox regression on mRNA expression. By simulation, I show two methods could select more true variables when analyzing omics based survival data. And Better performance is achieved in terms of overall survival time predicting in glioblastoma and lung adenocarcinoma patients using TCGA dataset. (3) To integrate pathway or gene set information, my colleagues and I develop a redundancy removable pathway (RRP) based feature selection method for binary and multi-class classification problems. Both strategies in (1) and (2) have the limitation of considering the genes (features) as the independent variables and ignoring the hidden relations among them. Our method uses a greedy algorithm to search the gene set whose distinguishing power is maximized for a specific pathway, and pathway activities inferred from the expression of selected genes, are used for a multi-class K nearest neighbor classifier. By testing our method in three sarcomas microarray datasets, we show our method is a robust feature selection method for multi-class classification. Overall, the above studies have provided more flexible approaches with knowledge integration to select biological relevant features in analyzing high-throughput omics data. The success of applying them in the real-world datasets have demonstrated a close interaction between biologists and statistician is critical to decipher the complex biological data generated in biomedical researches

    Cyclization/Hydrosilylation of Functionalized Diynes Catalyzed by a Cationic Rhodium Bis(phosphine) Complex

    No full text
    The cationic rhodium complex [Rh(BINAP)(COD)]+ BF4- (2) [BINAP = (±)-2,2‘-bis(diphenylphosphino)binaphthyl] catalyzed the cyclization/hydrosilylation of 5,5-dicarbomethoxy-2,7-nonadiyne (1) and triethylsilane to form (E,Z)-1,1-dicarbomethoxy-3-ethylidene-4-(1-triethylsilylethylidene)cyclopentane (3) in 77% yield with ≥50:1 isomeric purity. A number of functionalized 1,6-diynes that possessed internal alkynes in addition to 1 underwent cyclization/hydrosilylation catalyzed by 2 to form silylated 1,2-dialkylidenecycloalkanes in moderate to good yield with high diastereoselectivity

    Palladium-Catalyzed Cyclization/Carboalkoxylation of Alkenyl Indoles

    No full text
    Reaction of 1-methyl-2-(4-pentenyl)indole with a catalytic amount of PdCl2(CH3CN)2 (5 mol %) and a stoichiometric amount of CuCl2 (3 equiv) in methanol under CO (1 atm) at room temperature for 30 min led to cyclization/carboalkoxylation to form the corresponding tetrahydrocarbazole in 83% isolated yield as a single regioisomer. Palladium-catalyzed cyclization/carboalkoxylation of 2-(4-pentenyl)indoles tolerated substitution along the alkenyl chain and at the internal and cis-terminal olefinic positions. Palladium-catalyzed cyclization/carboalkoxylation tolerated a range of alcohols and was effective for the cyclization of 2-(3-butenyl)indoles, 3-(3-butenyl)indoles, 3-(4-pentenyl)indoles, and 2-(5-hexenyl)indoles

    Gold(I)-Catalyzed Intramolecular Enantioselective Hydroarylation of Allenes with Indoles

    No full text
    Treatment of 2-allenyl indole 4 with a catalytic 1:2 mixture of [(S)-2]Au2Cl2 [(S)-2 = (S)-3,5-tBu-4-MeO-MeOBIPHEP] and AgBF4 in toluene at −10 °C for 17 h led to isolation of tetrahydrocarbazole 5 in 88% yield with 92% ee. The protocol was effective for the cyclization of terminally disubstituted allenes and for the formation of seven-membered rings

    Quality Characteristics of a Pickled Tea Processed by Submerged Fermentation

    No full text
    There is a traditional pickled tea fermented under anaerobic condition in many Asian countries, but its quality characteristics are still unclear. The dynamic qualities, chemical components, and volatile components of pickled tea processed by submerged fermentation were investigated. The results showed that the sensory qualities of the pickled tea gradually improved with increasing fermentation time. Sensory qualities decreased gradually after 7 d of fermentation, and the change was extremely significant (p p < 0.01) within 5–7 d of fermentation, but the decrease was much slower in the later stage. The proper control of fermentation time is a key step for obtaining the desired quality of pickled tea. By submerged fermentation for 7 d, the best sensory quality of pickled tea was obtained: the taste of pickled tea was less bitter and astringent, but certainly acidic; the flavor of pickled tea was sour with sweet and floral tastes; the tea had low caffeine content, high gamma-amino butyric acid content, and high contents of special volatile components, such as hotrienol and 2,6-bis (1,1-dimethylethyl)-4-methyl-phenol. Results indicated that submerged fermentation is potentially a new method of processing pickled tea to obtain the desired quality. Our results contribute to the understanding of the microbial transformation of tea components under anaerobic conditions.</p

    Hydrogen Activation by Silica-Supported Metal Ion Catalysts: Catalytic Properties of Metals and Performance of DFT Functionals

    No full text
    Single-site heterogeneous catalysts (SSHC) have received increasing attention due to their well-defined active sites and potentially high specific activity. Detailed computational studies were carried out on a set of potential SSHC’s, i.e., silica-supported metal ions, to investigate the reactivity of these catalysts with H2 as well as to evaluate the performance of density functional theory (DFT) methods in conjunction with triple-ζ quality basis sets (i.e., cc-pVTZ) on reaction energetics. The ions considered include 4d and 5d metals as well as several post-transition metal ions. A representative cluster model of silica is used to calculate reaction free energies of the metal hydride formation that results from the heterolytic cleavage of H2 on the M–O bond. The hydride formation free energy is previously shown to be strongly correlated with the catalytic activity of such catalysts for alkene hydrogenation. ONIOM calculations (CCSD­(T)//MP2) are used to assess the accuracy and reliability of the MP2 results and it is found that MP2 is a suitable level of theory for gauging the performance of DFT functionals. The performance of various DFT functionals is assessed relative to MP2 results and it is found that the wB97xd and PBE0 functionals have the lowest standard deviation (STD) value while the MN12SX and PBE functionals have the lowest mean absolute deviation (MAD) values. The B3LYP functional is shown to have similar MAD and STD values as the top performing functionals. Potential active SSHC’s for exergonic hydrogen activation predicted in this study include mostly late and post transition metal ions, i.e., Au3+, Pd2+, Pt4+, Pd4+, Ir4+, Hg2+, Rh3+, Pb4+, Tl3+, In3+, Ir3+, Os4+, Cd2+, Ru2+, and Ga3+. This study provides important guidance to future computational studies of such catalyst systems

    Quality Characteristics of a Pickled Tea Processed by Submerged Fermentation

    No full text
    <p>There is a traditional pickled tea fermented under anaerobic condition in many Asian countries, but its quality characteristics are still unclear. The dynamic qualities, chemical components, and volatile components of pickled tea processed by submerged fermentation were investigated. The results showed that the sensory qualities of the pickled tea gradually improved with increasing fermentation time. Sensory qualities decreased gradually after 7 d of fermentation, and the change was extremely significant (<i>p</i> < 0.01). Correspondingly, biochemical indicators (except for soluble extract) decreased continuously and significantly (<i>p</i> < 0.01) within 5–7 d of fermentation, but the decrease was much slower in the later stage. The proper control of fermentation time is a key step for obtaining the desired quality of pickled tea. By submerged fermentation for 7 d, the best sensory quality of pickled tea was obtained: the taste of pickled tea was less bitter and astringent, but certainly acidic; the flavor of pickled tea was sour with sweet and floral tastes; the tea had low caffeine content, high gamma-amino butyric acid content, and high contents of special volatile components, such as hotrienol and 2,6-bis (1,1-dimethylethyl)-4-methyl-phenol. Results indicated that submerged fermentation is potentially a new method of processing pickled tea to obtain the desired quality. Our results contribute to the understanding of the microbial transformation of tea components under anaerobic conditions.</p

    Accelerating Molecular Dynamics Enrichments of High-Affinity Ligands for Proteins

    No full text
    Molecular docking algorithms are used to seek the most active compounds from a pool of ligands. In principle, molecular dynamics (MD) simulations with accurate physical potentials and sampling could yield better enrichments, but they are computationally expensive. Here, we describe a method called MELD-Bracket that utilizes biased replica exchange ladders in MD in order to compete different ligands against each other within a fast bracket style “binding tournament”. MELD-Bracket finds best-binders rapidly when ligands are well separated in their binding affinities

    The relationship between <i>T<sub>c</sub></i>* and <i>H<sub>m,si</sub></i>* and <i>H<sub>m,so</sub></i>* at <i>K<sub>s</sub></i>* = 10<sup>4</sup>.

    No full text
    <p>As <i>T<sub>c</sub></i>* is linearly proportional to <i>K<sub>s</sub></i>* when <i>K<sub>s</sub></i>* is higher than 0.1, it is easy to calculate <i>T<sub>c</sub></i>* for other values of <i>K<sub>s</sub></i>* using this figure.</p
    corecore