554 research outputs found

    Machine Learning Prediction of Cancer Cell Sensitivity to Drugs Based on Genomic and Chemical Properties

    Get PDF
    Predicting the response of a specific cancer to a therapy is a major goal in modern oncology that should ultimately lead to a personalised treatment. High-throughput screenings of potentially active compounds against a panel of genomically heterogeneous cancer cell lines have unveiled multiple relationships between genomic alterations and drug responses. Various computational approaches have been proposed to predict sensitivity based on genomic features, while others have used the chemical properties of the drugs to ascertain their effect. In an effort to integrate these complementary approaches, we developed machine learning models to predict the response of cancer cell lines to drug treatment, quantified through IC50 values, based on both the genomic features of the cell lines and the chemical properties of the considered drugs. Models predicted IC50 values in a 8-fold cross-validation and an independent blind test with coefficient of determination R2 of 0.72 and 0.64 respectively. Furthermore, models were able to predict with comparable accuracy (R2 of 0.61) IC50s of cell lines from a tissue not used in the training stage. Our in silico models can be used to optimise the experimental design of drug-cell screenings by estimating a large proportion of missing IC50 values rather than experimentally measuring them. The implications of our results go beyond virtual drug screening design: potentially thousands of drugs could be probed in silico to systematically test their potential efficacy as anti-tumour agents based on their structure, thus providing a computational framework to identify new drug repositioning opportunities as well as ultimately be useful for personalized medicine by linking the genomic traits of patients to drug sensitivity

    A computational model for anti-cancer drug sensitivity prediction

    Get PDF
    Various methods have been developed to build models for predicting drug response in cancer treatment based on patient data through machine learning algorithms. Drug prediction models can offer better patient data classification, optimising sensitivity identification in cancer therapy for suitable drugs. In this paper, a computational model based on Deep Neural Networks has been designed for prediction of anti-cancer drug response based on genetic expression data using publicly available drug profiling datasets from Cancer Cell Line Encyclopedia (CCLE). The model consists of several parts, including continuous drug response prediction, discretization and a drug sensitivity result output. Regularization and compression of neuron connections is also implemented to make the model compact and efficient, outperforming other widely used algorithms, such as elastic net (EN), random forest (RF), support vector regression (SVR) and simple artificial neural network (ANN) in sensitivity analysis and predictive accuracy

    Towards the routine use of in silico screenings for drug discovery using metabolic modelling

    Get PDF
    Currently, the development of new effective drugs for cancer therapy is not only hindered by development costs, drug efficacy, and drug safety but also by the rapid occurrence of drug resistance in cancer. Hence, new tools are needed to study the underlying mechanisms in cancer. Here, we discuss the current use of metabolic modelling approaches to identify cancer-specific metabolism and find possible new drug targets and drugs for repurposing. Furthermore, we list valuable resources that are needed for the reconstruction of cancer-specific models by integrating various available datasets with genome-scale metabolic reconstructions using model-building algorithms. We also discuss how new drug targets can be determined by using gene essentiality analysis, an in silico method to predict essential genes in a given condition such as cancer and how synthetic lethality studies could greatly benefit cancer patients by suggesting drug combinations with reduced side effects

    Predicting drug response of tumors from integrated genomic profiles by deep neural networks

    Full text link
    The study of high-throughput genomic profiles from a pharmacogenomics viewpoint has provided unprecedented insights into the oncogenic features modulating drug response. A recent screening of ~1,000 cancer cell lines to a collection of anti-cancer drugs illuminated the link between genotypes and vulnerability. However, due to essential differences between cell lines and tumors, the translation into predicting drug response in tumors remains challenging. Here we proposed a DNN model to predict drug response based on mutation and expression profiles of a cancer cell or a tumor. The model contains a mutation and an expression encoders pre-trained using a large pan-cancer dataset to abstract core representations of high-dimension data, followed by a drug response predictor network. Given a pair of mutation and expression profiles, the model predicts IC50 values of 265 drugs. We trained and tested the model on a dataset of 622 cancer cell lines and achieved an overall prediction performance of mean squared error at 1.96 (log-scale IC50 values). The performance was superior in prediction error or stability than two classical methods and four analog DNNs of our model. We then applied the model to predict drug response of 9,059 tumors of 33 cancer types. The model predicted both known, including EGFR inhibitors in non-small cell lung cancer and tamoxifen in ER+ breast cancer, and novel drug targets. The comprehensive analysis further revealed the molecular mechanisms underlying the resistance to a chemotherapeutic drug docetaxel in a pan-cancer setting and the anti-cancer potential of a novel agent, CX-5461, in treating gliomas and hematopoietic malignancies. Overall, our model and findings improve the prediction of drug response and the identification of novel therapeutic options.Comment: Accepted for presentation in the International Conference on Intelligent Biology and Medicine (ICIBM 2018) at Los Angeles, CA, USA. Currently under consideration for publication in a Supplement Issue of BMC Genomic

    ์ƒ๋ฌผํ•™์  ์‚ฌ์ „ ์ง€์‹์„ ํ™œ์šฉํ•œ ๊ณ ์ฐจ์›์˜ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๊ด€๊ณ„๋ฅผ ์ฐพ๋Š” ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. ๊น€์„ .์„ธํฌ๊ฐ€ ์–ด๋–ป๊ฒŒ ๊ธฐ๋Šฅํ•˜๊ณ  ์™ธ๋ถ€ ์ž๊ทน์— ๋ฐ˜์‘ํ•˜๋Š”์ง€ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์€ ์ƒ๋ฌผํ•™, ์˜ํ•™์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ด€์‹ฌ์‚ฌ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์œผ๋กœ ๊ณผํ•™์ž๋“ค์€ ๋‹จ์ผ ์ƒ๋ฌผํ•™์  ์‹คํ—˜์œผ๋กœ ์„ธํฌ์˜ ๋ณ€ํ™”์š”์ธ๋“ค์„ ์‰ฝ๊ฒŒ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. ์ฃผ๋ชฉํ• ๋งŒํ•œ ์˜ˆ์‹œ๋กœ ๊ฒŒ๋†ˆ ์‹œํ€€์‹ฑ, ์œ ์ „์ž ๋ฐœํ˜„๋Ÿ‰ ์ธก์ •, ์œ ์ „์ž ๋ฐœํ˜„์„ ์กฐ์ ˆํ•˜๋Š” ํ›„์„ฑ ์œ ์ „์ฒด ์ธก์ • ๊ฐ™์€ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค. ์„ธํฌ์˜ ์ƒํƒœ๋ฅผ ๋” ์ž์„ธํžˆ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž ์‚ฌ์ด์˜ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์•Œ์•„๋‚ด๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. ํ•˜์ง€๋งŒ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ ๊ด€๊ณ„๋Š” ๋งค์šฐ ๋ณต์žกํ•˜๊ณ  ๋ชจ๋“  ์„ธํฌ ์ƒํƒœ ํŠน์ด์ ์ธ ๊ด€๊ณ„๋ฅผ ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆํ•˜๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๋”ฐ๋ผ์„œ, ์„œ๋กœ ๋‹ค๋ฅธ ์œ ํ˜•์˜ ๊ณ ์ฐจ์› ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ํšจ์œจ์ ์ธ ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ๋ฐฉ๋ฒ•์ด ์š”๊ตฌ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ํ•œ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ์„ ๋ณ„๋œ ์œ ์ „์ž์˜ ๊ธฐ๋Šฅ๊ณผ ์˜ค๋ฏน์Šค ๊ฐ„์˜ ๊ด€๊ณ„์™€ ๊ฐ™์€ ์™ธ๋ถ€ ์ƒ๋ฌผํ•™์  ์ง€์‹์„ ํ†ตํ•ฉํ•˜์—ฌ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๋ณธ ๋ฐ•์‚ฌํ•™์œ„ ๋…ผ๋ฌธ์€ ์ƒ๋ฌผํ•™์  ์‚ฌ์ „ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์œ ์ „์ž์˜ ๋ฐœํ˜„์„ ์กฐ์ ˆํ•˜๋Š” ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ์„ธ ๊ฐ€์ง€ ์ปดํ“จํ„ฐ ๊ณตํ•™์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด์™€ ์œ ์ „์ž์˜ ์ผ๋Œ€๋‹ค ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด ํ‘œ์  ์˜ˆ์ธก ๋ฌธ์ œ๋Š” ๊ฐ€๋Šฅํ•œ ํ‘œ์  ์œ ์ „์ž์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์œผ๋ฉฐ ๊ฑฐ์ง“ ์–‘์„ฑ๊ณผ ๊ฑฐ์ง“์Œ์„ฑ์˜ ๋น„์œจ์„ ์กฐ์ ˆํ•ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž์™€ ๋ฐ์ดํ„ฐ์˜ ๋งฅ๋ฝ ์‚ฌ์ด์˜ ์—ฐ๊ด€์„ฑ์„ ๋ฌธํ—Œ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๊ฒฐ์ •ํ•˜๊ณ  ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ContextMMIA๋ฅผ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ContextMMIA๋Š” ํ†ต๊ณ„์  ์œ ์˜์„ฑ๊ณผ ๋ฌธํ—Œ ๊ด€๋ จ์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„์˜ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๊ด€๊ณ„์˜ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ์˜ˆํ›„๊ฐ€ ๋‹ค๋ฅธ ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ ContextMMIA๋Š” ์˜ˆํ›„๊ฐ€ ๋‚˜์œ ์œ ๋ฐฉ์•”์—์„œ ํ™œ์„ฑํ™”๋œ ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๊ณ  ๊ธฐ์กด ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆ๋œ ๊ด€๊ณ„๊ฐ€ ๋†’์€ ์šฐ์„ ์ˆœ์œ„๋กœ ์˜ˆ์ธก๋˜์—ˆ์œผ๋ฉฐ ํ•ด๋‹น ์œ ์ „์ž๋“ค์ด ์œ ๋ฐฉ์•” ๊ด€๋ จ ๊ฒฝ๋กœ์— ๊ด€์—ฌํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์กŒ๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” ์•ฝ๋ฌผ ๋ฐ˜์‘์„ ์ผ์œผํ‚ค๋Š” ์œ ์ „์ž์˜ ๋‹ค๋Œ€์ผ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ์•ฝ๋ฌผ ๋ฐ˜์‘ ์˜ˆ์ธก์„ ์œ„ํ•ด์„œ ์•ฝ๋ฌผ ๋ฐ˜์‘ ๋งค๊ฐœ ์œ ์ „์ž๋ฅผ ๊ฒฐ์ •ํ•ด์•ผ ํ•˜๋ฉฐ ์ด๋ฅผ ์œ„ํ•ด 20,000๊ฐœ ์œ ์ „์ž์˜ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ฉ ๋ถ„์„ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ €์ฐจ์› ์ž„๋ฒ ๋”ฉ ๋ฐฉ๋ฒ•, ์•ฝ๋ฌผ-์œ ์ „์ž ์—ฐ๊ด€์„ฑ์— ๋Œ€ํ•œ ๋ฌธํ—Œ ์ง€์‹ ๋ฐ ์œ ์ „์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ์•ฝ๋ฌผ ๋ฐ˜์‘์„ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ DRIM์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. DRIM์€ ์˜คํ† ์ธ์ฝ”๋”, ํ…์„œ ๋ถ„ํ•ด, ์•ฝ๋ฌผ-์œ ์ „์ž ์—ฐ๊ด€์„ฑ์„ ์ด์šฉํ•˜์—ฌ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ์—์„œ ๋‹ค๋Œ€์ผ ๊ด€๊ณ„๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ๊ฒฐ์ •๋œ ๋งค๊ฐœ ์œ ์ „์ž์˜ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์œ ์ „์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹๊ณผ ์•ฝ๋ฌผ ๋ฐ˜์‘ ์‹œ๊ณ„์—ด ์œ ์ „์ž ๋ฐœํ˜„ ๋ฐ์ดํ„ฐ์˜ ์ƒํ˜ธ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฒฐ์ •ํ•œ๋‹ค. ์œ ๋ฐฉ์•” ์„ธํฌ์ฃผ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ DRIM์€ ๋ผํŒŒํ‹ฐ๋‹™์ด ํ‘œ์ ์œผ๋กœ ํ•˜๋Š” PI3K-Akt ํŒจ์Šค์›จ์ด์— ๊ด€์—ฌํ•˜๋Š” ์œ ์ „์ž๋“ค์˜ ์•ฝ๋ฌผ ๋ฐ˜์‘ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๊ณ  ๋ผํŒŒํ‹ฐ๋‹™ ๋ฐ˜์‘์„ฑ๊ณผ ๊ด€๋ จ๋œ ๋งค๊ฐœ ์œ ์ „์ž๋ฅผ ์˜ˆ์ธกํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์˜ˆ์ธก๋œ ์กฐ์ ˆ ๊ด€๊ณ„๊ฐ€ ์„ธํฌ์ฃผ ํŠน์ด์ ์ธ ํŒจํ„ด์„ ๋ณด์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์„ธ ๋ฒˆ์งธ๋Š” ์„ธํฌ์˜ ์ƒํƒœ๋ฅผ ์„ค๋ช…ํ•˜๋Š” ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž์˜ ๋‹ค๋Œ€๋‹ค ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„ ์˜ˆ์ธก์„ ์œ„ํ•ด ๊ด€์ฐฐ๋œ ์œ ์ „์ž ๋ฐœํ˜„ ๊ฐ’๊ณผ ์œ ์ „์ž ์กฐ์ ˆ ๋„คํŠธ์›Œํฌ๋กœ๋ถ€ํ„ฐ ์ถ”์ •๋œ ์œ ์ „์ž ๋ฐœํ˜„ ๊ฐ’ ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค. ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์กฐ์ ˆ์ธ์ž์™€ ์œ ์ „์ž์˜ ์ˆ˜์— ๋”ฐ๋ผ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒ€์ƒ‰ ๊ณต๊ฐ„์„ ํƒ์ƒ‰ํ•ด์•ผ ํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์กฐ์ ˆ์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ์—ฐ์‚ฐ์„ ๋ฐ˜๋ณตํ•˜์—ฌ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์ฐพ๋Š” ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋„คํŠธ์›Œํฌ์— ๊ฐ„์„ ์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ•ํ™” ํ•™์Šต ๊ธฐ๋ฐ˜ ํœด๋ฆฌ์Šคํ‹ฑ์„ ํ†ตํ•ด ์กฐ์ ˆ์ž๋ฅผ ์„ ํƒํ•˜๋Š” ๋‹ค๋Œ€์ผ ์œ ์ „์ž ์ค‘์‹ฌ ๊ด€๊ณ„๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค. ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋„คํŠธ์›Œํฌ์—์„œ ๊ฐ„์„ ์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด ์œ ์ „์ž๋ฅผ ํ™•๋ฅ ์ ์œผ๋กœ ์„ ํƒํ•˜๋Š” ์ผ๋Œ€๋‹ค ์กฐ์ ˆ์ž ์ค‘์‹ฌ ๊ด€๊ณ„๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค. ์œ ๋ฐฉ์•” ์„ธํฌ์ฃผ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ด์ „์˜ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ์ •ํ™•ํ•œ ์œ ์ „์ž ๋ฐœํ˜„๋Ÿ‰ ์ถ”์ •์„ ํ•˜์˜€๊ณ  ์กฐ์ ˆ์ž ๋ฐ ์œ ์ „์ž ๋ฐœํ˜„ ๋ฐ์ดํ„ฐ๋กœ ์œ ๋ฐฉ์•” ์•„ํ˜• ํŠน์ด์  ๋„คํŠธ์›Œํฌ๋ฅผ ๊ตฌ์„ฑํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์œ ๋ฐฉ์•” ์•„ํ˜• ๊ด€๋ จ ์‹คํ—˜ ๊ฒ€์ฆ๋œ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๋‹ค. ์š”์•ฝํ•˜๋ฉด, ๋ณธ ๋ฐ•์‚ฌํ•™์œ„ ๋…ผ๋ฌธ์€ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž์˜ ์‚ฌ์ด์˜ ์ผ๋Œ€๋‹ค, ๋‹ค๋Œ€์ผ, ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ƒ๋ฌผํ•™์  ์ง€์‹์„ ํ™œ์šฉํ•œ ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋Š” ๋ถ„์ž ์ƒ๋ฌผํ•™ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์œ ์ „์ž ์กฐ์ ˆ ์ƒํ˜ธ ์ž‘์šฉ์„ ์ดํ•ดํ•จ์œผ๋กœ์จ ์„ธํฌ ๊ธฐ๋Šฅ์— ๋Œ€ํ•œ ์‹ฌ์ธต์ ์ธ ์ดํ•ด๋ฅผ ๋„์™€์ค„ ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.Understanding how cells function or respond to external stimuli is one of the most important questions in biology and medicine. Thanks to the advances in instrumental technologies, scientists can routinely measure events within cells in single biological experiments. Notable examples are multi-omics data: sequencing of genomes, quantifications of gene expression, and identification of epigenetic events that regulate expression of genes. In order to better understand cellular mechanisms, it is essential to identify regulatory relationships between multi-omics regulators and genes. However, regulatory relationships are very complex and it is infeasible to validate all condition-specific relationships experimentally. Thus, there is an urgent need for an efficient computational method to extract relationships from different types of high-dimensional omics data. One way to address these high-dimensional data is to incorporate external biological knowledge such as relationships between omics and functions of genes curated in various databases. In my doctoral study, I developed three computational approaches to identify the regulatory relationships from multi-omics data utilizing biological prior knowledge. The first study proposes a method to predict one-to-m relationships between miRNA and genes. The computational challenge of miRNA target prediction is that there are many miRNA target candidates, and the ratio of false positives to false negatives needs to be adjusted. This challenge is addressed by utilizing literature knowledge for determining the association between miRNA-gene and a given context. In this study, I developed ContextMMIA to predict miRNA-gene relationships from miRNA and gene expression data. ContextMMIA computes scores of miRNA-gene relationships based on statistical significance and literature relevance and prioritizes the relationships based on the scores. In experiments on breast cancer data with different prognosis, ContextMMIA predicted differentially activated miRNA-gene relationships in invasive breast cancer. The experimentally verified miRNA-gene relationships were predicted with high priority and those genes are known to be involved in breast cancer-related pathways. The second study proposes a method to predict n-to-one relationships between regulators and gene on drug response. The computational challenge of drug response prediction is how to integrate multi-omics data of 20,000 genes for determining drug response mediator genes. This challenge is addressed by utilizing low-dimensional embedding methods, literature knowledge of drug-gene associations, and gene-gene interaction knowledge. For this problem, I developed DRIM to predict drug response relationships from the multi-omics data and drug-induced time-series gene expression data. DRIM uses autoencoder, tensor decomposition, and drug-gene association to determine n-to-one relationships from multi-omics data. Then, regulatory relationships of mediator genes are determined by gene-gene interaction knowledge and cross-correlation of drug-induced time-series gene expression data. In experiments on breast cancer cell line data, DRIM extracted mediator genes relevant to drug response and regulatory relationships of genes involved in the PI3K-Akt pathway targeted by lapatinib. In addition, DRIM revealed distinguished patterns of relationships in breast cancer cell lines with different lapatinib resistance. The third study proposes a method to predict n-to-m relationships between regulators and genes. In order to predict n-to-m relationships, this study formulated an objective function that measures the deviation between observed gene expression values and estimated gene expression values derived from gene regulatory networks. The computational challenge of minimizing the objective function is to navigate the search space of relationships exponentially increasing according to the number of regulators and genes. This challenge is addressed by the iterative local optimization with regulator-gene interaction knowledge. In this study, I developed a two-step iterative RL-based method to predict n-to-m relationships from regulator and gene expression data. The first step is to explore the n-to-one gene-oriented step that selects regulators by reinforcement learning based heuristic to add edges to the network. The second step is to explore the one-to-m regulator-oriented step that stochastically selects genes to remove edges from the network. In experiments on breast cancer cell line data, the proposed method constructed breast cancer subtype-specific networks from the regulator and gene expression profiles with a more accurate gene expression estimation than previous combinatorial optimization methods. Moreover, regulatory relationships involved in the networks were associated with breast cancer subtypes. In summary, in this thesis, I proposed computational methods for predicting one-to-m, n-to-one, and n-to-m relationships between multi-omics regulators and genes utilizing external domain knowledge. The proposed methods are expected to deepen our knowledge of cellular mechanisms by understanding gene regulatory interactions by analyzing the ever-increasing molecular biology data such as The Cancer Genome Atlas, Cancer Cell Line Encyclopedia.Chapter 1 Introduction 1 1.1 Biological background 1 1.1.1 Multi-omics analysis 1 1.1.2 Multi-omics relationships indicating cell state 2 1.1.3 Biological prior knowledge 4 1.2 Research problems for the multi-omics relationship 6 1.3 Computational challenges and approaches in the exploring multiomics relationship 6 1.4 Outline of the thesis 12 Chapter 2 Literature-based condition-specific miRNA-mRNA target prediction 13 2.1 Computational Problem & Evaluation criterion 14 2.2 Related works 15 2.3 Motivation 17 2.4 Methods 20 2.4.1 Identifying genes and miRNAs based on the user-provided context 22 2.4.2 Omics Score 23 2.4.3 Context Score 24 2.4.4 Confidence Score 26 2.5 Results 26 2.5.1 Pathway analysis 27 2.5.2 Reproducibility of validated targets in humans 31 2.5.3 Sensitivity tests when different keywords are used 33 2.6 Summary 34 Chapter 3 DRIM: A web-based system for investigating drug response at the molecular level by condition-specific multi-omics data integration 36 3.1 Computational Problem & Evaluation criterion 37 3.2 Related works 38 3.3 Motivation 42 3.4 Methods 44 3.4.1 Step 1: Input 45 3.4.2 Step 2: Identifying perturbed sub-pathway with time-series 45 3.4.3 Step 3: Embedding multi-omics for selecting potential mediator genes 47 3.4.4 Step 4: Construct TF-regulatory time-bounded network and identify regulatory path 52 3.4.5 Step 5: Analysis result on the web 52 3.5 Case study: Comparative analysis of breast cancer cell lines that have different sensitivity with lapatinib 54 3.5.1 Multi-omics analysis result before drug treatment 56 3.5.2 Time-series gene expression analysis after drug treatment 57 3.6 Summary 61 Chapter 4 Combinatorial modeling and optimization using iterative RL search for inferring sample-specific regulatory network 63 4.1 Computational Problem & Evaluation criterion 64 4.2 Related works 64 4.3 Motivation 66 4.4 Methods 68 4.4.1 Formulating an objective function 68 4.4.2 Overview of an iterative search method 70 4.4.3 G-step for exploring n-to-one gene-oriented relationship 73 4.4.4 R-step for exploring one-to-m regulator-oriented relationship 79 4.5 Results 80 4.5.1 Cancer cell line data 80 4.5.2 Hyperparameters 81 4.5.3 Quantitative evaluation 82 4.5.4 Qualitative evaluation 83 4.6 Summary 86 Chapter 5 Conclusions 88 ๊ตญ๋ฌธ์ดˆ๋ก 111๋ฐ•

    A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction

    Get PDF
    Modeling sensitivity to drugs based on genetic characterizations is a significant challenge in the area of systems medicine. Ensemble based approaches such as Random Forests have been shown to perform well in both individual sensitivity prediction studies and team science based prediction challenges. However, Random Forests generate a deterministic predictive model for each drug based on the genetic characterization of the cell lines and ignores the relationship between different drug sensitivities during model generation. This application motivates the need for generation of multivariate ensemble learning techniques that can increase prediction accuracy and improve variable importance ranking by incorporating the relationships between different output responses. In this article, we propose a novel cost criterion that captures the dissimilarity in the output response structure between the training data and node samples as the difference in the two empirical copulas. We illus- trate that copulas are suitable for capturing the multivariate structure of output responses independent of the marginal distributions and the copula based multivariate random forest framework can provide higher accuracy prediction and improved variable selection. The proposed framework has been validated on genomics of drug sensitivity for cancer and cancer cell line encyclopedia database

    Modeling and simulation applications with potential impact in drug development and patient care

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Model-based drug development has become an essential element to potentially make drug development more productive by assessing the data using mathematical and statistical approaches to construct and utilize models to increase the understanding of the drug and disease. The modeling and simulation approach not only quantifies the exposure-response relationship, and the level of variability, but also identifies the potential contributors to the variability. I hypothesized that the modeling and simulation approach can: 1) leverage our understanding of pharmacokinetic-pharmacodynamic (PK-PD) relationship from pre-clinical system to human; 2) quantitatively capture the drug impact on patients; 3) evaluate clinical trial designs; and 4) identify potential contributors to drug toxicity and efficacy. The major findings for these studies included: 1) a translational PK modeling approach that predicted clozapine and norclozapine central nervous system exposures in humans relating these exposures to receptor binding kinetics at multiple receptors; 2) a population pharmacokinetic analysis of a study of sertraline in depressed elderly patients with Alzheimerโ€™s disease that identified site specific differences in drug exposure contributing to the overall variability in sertraline exposure; 3) the utility of a longitudinal tumor dynamic model developed by the Food and Drug Administration for predicting survival in non-small cell lung cancer patients, including an exploration of the limitations of this approach; 4) a Monte Carlo clinical trial simulation approach that was used to evaluate a pre-defined oncology trial with a sparse drug concentration sampling schedule with the aim to quantify how well individual drug exposures, random variability, and the food effects of abiraterone and nilotinib were determined under these conditions; 5) a time to event analysis that facilitated the identification of candidate genes including polymorphisms associated with vincristine-induced neuropathy from several association analyses in childhood acute lymphoblastic leukemia (ALL) patients; and 6) a LASSO penalized regression model that predicted vincristine-induced neuropathy and relapse in ALL patients and provided the basis for a risk assessment of the population. Overall, results from this dissertation provide an improved understanding of treatment effect in patients with an assessment of PK/PD combined and with a risk evaluation of drug toxicity and efficacy

    WENDI: A tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In recent years, there has been a huge increase in the amount of publicly-available and proprietary information pertinent to drug discovery. However, there is a distinct lack of data mining tools available to harness this information, and in particular for knowledge discovery across multiple information sources. At Indiana University we have an ongoing project with Eli Lilly to develop web-service based tools for integrative mining of chemical and biological information. In this paper, we report on the first of these tools, called WENDI (Web Engine for Non-obvious Drug Information) that attempts to find non-obvious relationships between a query compound and scholarly publications, biological properties, genes and diseases using multiple information sources.</p> <p>Results</p> <p>We have created an aggregate web service that takes a query compound as input, calls multiple web services for computation and database search, and returns an XML file that aggregates this information. We have also developed a client application that provides an easy-to-use interface to this web service. Both the service and client are publicly available.</p> <p>Conclusions</p> <p>Initial testing indicates this tool is useful in identifying potential biological applications of compounds that are not obvious, and in identifying corroborating and conflicting information from multiple sources. We encourage feedback on the tool to help us refine it further. We are now developing further tools based on this model.</p
    • โ€ฆ
    corecore