249 research outputs found

    Integrated regulatory models for inference of subtype-specific susceptibilities in glioblastoma

    Get PDF
    Abstract Glioblastoma multiforme (GBM) is a highly malignant form of cancer that lacks effective treatment options or wellโ€defined strategies for personalized cancer therapy. The disease has been stratified into distinct molecular subtypes; however, the underlying regulatory circuitry that gives rise to such heterogeneity and its implications for therapy remain unclear. We developed a modular computational pipeline, Integrative Modeling of Transcription Regulatory Interactions for Systematic Inference of Susceptibility in Cancer (inTRINSiC), to dissect subtypeโ€specific regulatory programs and predict genetic dependencies in individual patient tumors. Using a multilayer network consisting of 518 transcription factors (TFs), 10,733 target genes, and a signaling layer of 3,132 proteins, we were able to accurately identify differential regulatory activity of TFs that shape subtypeโ€specific expression landscapes. Our models also allowed inference of mechanisms for altered TF behavior in different GBM subtypes. Most importantly, we were able to use the multilayer models to perform anย in silicoย perturbation analysis to infer differential genetic vulnerabilities across GBM subtypes and pinpoint the MYB family member MYBL2 as a drug target specific for the Proneural subtype

    ์ƒ๋ฌผํ•™์  ์‚ฌ์ „ ์ง€์‹์„ ํ™œ์šฉํ•œ ๊ณ ์ฐจ์›์˜ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๊ด€๊ณ„๋ฅผ ์ฐพ๋Š” ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. ๊น€์„ .์„ธํฌ๊ฐ€ ์–ด๋–ป๊ฒŒ ๊ธฐ๋Šฅํ•˜๊ณ  ์™ธ๋ถ€ ์ž๊ทน์— ๋ฐ˜์‘ํ•˜๋Š”์ง€ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์€ ์ƒ๋ฌผํ•™, ์˜ํ•™์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ด€์‹ฌ์‚ฌ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์œผ๋กœ ๊ณผํ•™์ž๋“ค์€ ๋‹จ์ผ ์ƒ๋ฌผํ•™์  ์‹คํ—˜์œผ๋กœ ์„ธํฌ์˜ ๋ณ€ํ™”์š”์ธ๋“ค์„ ์‰ฝ๊ฒŒ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. ์ฃผ๋ชฉํ• ๋งŒํ•œ ์˜ˆ์‹œ๋กœ ๊ฒŒ๋†ˆ ์‹œํ€€์‹ฑ, ์œ ์ „์ž ๋ฐœํ˜„๋Ÿ‰ ์ธก์ •, ์œ ์ „์ž ๋ฐœํ˜„์„ ์กฐ์ ˆํ•˜๋Š” ํ›„์„ฑ ์œ ์ „์ฒด ์ธก์ • ๊ฐ™์€ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค. ์„ธํฌ์˜ ์ƒํƒœ๋ฅผ ๋” ์ž์„ธํžˆ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž ์‚ฌ์ด์˜ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์•Œ์•„๋‚ด๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. ํ•˜์ง€๋งŒ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ ๊ด€๊ณ„๋Š” ๋งค์šฐ ๋ณต์žกํ•˜๊ณ  ๋ชจ๋“  ์„ธํฌ ์ƒํƒœ ํŠน์ด์ ์ธ ๊ด€๊ณ„๋ฅผ ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆํ•˜๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๋”ฐ๋ผ์„œ, ์„œ๋กœ ๋‹ค๋ฅธ ์œ ํ˜•์˜ ๊ณ ์ฐจ์› ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ํšจ์œจ์ ์ธ ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ๋ฐฉ๋ฒ•์ด ์š”๊ตฌ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ํ•œ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ์„ ๋ณ„๋œ ์œ ์ „์ž์˜ ๊ธฐ๋Šฅ๊ณผ ์˜ค๋ฏน์Šค ๊ฐ„์˜ ๊ด€๊ณ„์™€ ๊ฐ™์€ ์™ธ๋ถ€ ์ƒ๋ฌผํ•™์  ์ง€์‹์„ ํ†ตํ•ฉํ•˜์—ฌ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๋ณธ ๋ฐ•์‚ฌํ•™์œ„ ๋…ผ๋ฌธ์€ ์ƒ๋ฌผํ•™์  ์‚ฌ์ „ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์œ ์ „์ž์˜ ๋ฐœํ˜„์„ ์กฐ์ ˆํ•˜๋Š” ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ์„ธ ๊ฐ€์ง€ ์ปดํ“จํ„ฐ ๊ณตํ•™์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด์™€ ์œ ์ „์ž์˜ ์ผ๋Œ€๋‹ค ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด ํ‘œ์  ์˜ˆ์ธก ๋ฌธ์ œ๋Š” ๊ฐ€๋Šฅํ•œ ํ‘œ์  ์œ ์ „์ž์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์œผ๋ฉฐ ๊ฑฐ์ง“ ์–‘์„ฑ๊ณผ ๊ฑฐ์ง“์Œ์„ฑ์˜ ๋น„์œจ์„ ์กฐ์ ˆํ•ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž์™€ ๋ฐ์ดํ„ฐ์˜ ๋งฅ๋ฝ ์‚ฌ์ด์˜ ์—ฐ๊ด€์„ฑ์„ ๋ฌธํ—Œ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๊ฒฐ์ •ํ•˜๊ณ  ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ContextMMIA๋ฅผ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ContextMMIA๋Š” ํ†ต๊ณ„์  ์œ ์˜์„ฑ๊ณผ ๋ฌธํ—Œ ๊ด€๋ จ์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„์˜ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๊ด€๊ณ„์˜ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ์˜ˆํ›„๊ฐ€ ๋‹ค๋ฅธ ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ ContextMMIA๋Š” ์˜ˆํ›„๊ฐ€ ๋‚˜์œ ์œ ๋ฐฉ์•”์—์„œ ํ™œ์„ฑํ™”๋œ ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๊ณ  ๊ธฐ์กด ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆ๋œ ๊ด€๊ณ„๊ฐ€ ๋†’์€ ์šฐ์„ ์ˆœ์œ„๋กœ ์˜ˆ์ธก๋˜์—ˆ์œผ๋ฉฐ ํ•ด๋‹น ์œ ์ „์ž๋“ค์ด ์œ ๋ฐฉ์•” ๊ด€๋ จ ๊ฒฝ๋กœ์— ๊ด€์—ฌํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์กŒ๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” ์•ฝ๋ฌผ ๋ฐ˜์‘์„ ์ผ์œผํ‚ค๋Š” ์œ ์ „์ž์˜ ๋‹ค๋Œ€์ผ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ์•ฝ๋ฌผ ๋ฐ˜์‘ ์˜ˆ์ธก์„ ์œ„ํ•ด์„œ ์•ฝ๋ฌผ ๋ฐ˜์‘ ๋งค๊ฐœ ์œ ์ „์ž๋ฅผ ๊ฒฐ์ •ํ•ด์•ผ ํ•˜๋ฉฐ ์ด๋ฅผ ์œ„ํ•ด 20,000๊ฐœ ์œ ์ „์ž์˜ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ฉ ๋ถ„์„ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ €์ฐจ์› ์ž„๋ฒ ๋”ฉ ๋ฐฉ๋ฒ•, ์•ฝ๋ฌผ-์œ ์ „์ž ์—ฐ๊ด€์„ฑ์— ๋Œ€ํ•œ ๋ฌธํ—Œ ์ง€์‹ ๋ฐ ์œ ์ „์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ์•ฝ๋ฌผ ๋ฐ˜์‘์„ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ DRIM์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. DRIM์€ ์˜คํ† ์ธ์ฝ”๋”, ํ…์„œ ๋ถ„ํ•ด, ์•ฝ๋ฌผ-์œ ์ „์ž ์—ฐ๊ด€์„ฑ์„ ์ด์šฉํ•˜์—ฌ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ์—์„œ ๋‹ค๋Œ€์ผ ๊ด€๊ณ„๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ๊ฒฐ์ •๋œ ๋งค๊ฐœ ์œ ์ „์ž์˜ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์œ ์ „์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹๊ณผ ์•ฝ๋ฌผ ๋ฐ˜์‘ ์‹œ๊ณ„์—ด ์œ ์ „์ž ๋ฐœํ˜„ ๋ฐ์ดํ„ฐ์˜ ์ƒํ˜ธ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฒฐ์ •ํ•œ๋‹ค. ์œ ๋ฐฉ์•” ์„ธํฌ์ฃผ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ DRIM์€ ๋ผํŒŒํ‹ฐ๋‹™์ด ํ‘œ์ ์œผ๋กœ ํ•˜๋Š” PI3K-Akt ํŒจ์Šค์›จ์ด์— ๊ด€์—ฌํ•˜๋Š” ์œ ์ „์ž๋“ค์˜ ์•ฝ๋ฌผ ๋ฐ˜์‘ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๊ณ  ๋ผํŒŒํ‹ฐ๋‹™ ๋ฐ˜์‘์„ฑ๊ณผ ๊ด€๋ จ๋œ ๋งค๊ฐœ ์œ ์ „์ž๋ฅผ ์˜ˆ์ธกํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์˜ˆ์ธก๋œ ์กฐ์ ˆ ๊ด€๊ณ„๊ฐ€ ์„ธํฌ์ฃผ ํŠน์ด์ ์ธ ํŒจํ„ด์„ ๋ณด์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์„ธ ๋ฒˆ์งธ๋Š” ์„ธํฌ์˜ ์ƒํƒœ๋ฅผ ์„ค๋ช…ํ•˜๋Š” ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž์˜ ๋‹ค๋Œ€๋‹ค ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„ ์˜ˆ์ธก์„ ์œ„ํ•ด ๊ด€์ฐฐ๋œ ์œ ์ „์ž ๋ฐœํ˜„ ๊ฐ’๊ณผ ์œ ์ „์ž ์กฐ์ ˆ ๋„คํŠธ์›Œํฌ๋กœ๋ถ€ํ„ฐ ์ถ”์ •๋œ ์œ ์ „์ž ๋ฐœํ˜„ ๊ฐ’ ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค. ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์กฐ์ ˆ์ธ์ž์™€ ์œ ์ „์ž์˜ ์ˆ˜์— ๋”ฐ๋ผ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒ€์ƒ‰ ๊ณต๊ฐ„์„ ํƒ์ƒ‰ํ•ด์•ผ ํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์กฐ์ ˆ์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ์—ฐ์‚ฐ์„ ๋ฐ˜๋ณตํ•˜์—ฌ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์ฐพ๋Š” ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋„คํŠธ์›Œํฌ์— ๊ฐ„์„ ์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ•ํ™” ํ•™์Šต ๊ธฐ๋ฐ˜ ํœด๋ฆฌ์Šคํ‹ฑ์„ ํ†ตํ•ด ์กฐ์ ˆ์ž๋ฅผ ์„ ํƒํ•˜๋Š” ๋‹ค๋Œ€์ผ ์œ ์ „์ž ์ค‘์‹ฌ ๊ด€๊ณ„๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค. ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋„คํŠธ์›Œํฌ์—์„œ ๊ฐ„์„ ์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด ์œ ์ „์ž๋ฅผ ํ™•๋ฅ ์ ์œผ๋กœ ์„ ํƒํ•˜๋Š” ์ผ๋Œ€๋‹ค ์กฐ์ ˆ์ž ์ค‘์‹ฌ ๊ด€๊ณ„๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค. ์œ ๋ฐฉ์•” ์„ธํฌ์ฃผ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ด์ „์˜ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ์ •ํ™•ํ•œ ์œ ์ „์ž ๋ฐœํ˜„๋Ÿ‰ ์ถ”์ •์„ ํ•˜์˜€๊ณ  ์กฐ์ ˆ์ž ๋ฐ ์œ ์ „์ž ๋ฐœํ˜„ ๋ฐ์ดํ„ฐ๋กœ ์œ ๋ฐฉ์•” ์•„ํ˜• ํŠน์ด์  ๋„คํŠธ์›Œํฌ๋ฅผ ๊ตฌ์„ฑํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์œ ๋ฐฉ์•” ์•„ํ˜• ๊ด€๋ จ ์‹คํ—˜ ๊ฒ€์ฆ๋œ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๋‹ค. ์š”์•ฝํ•˜๋ฉด, ๋ณธ ๋ฐ•์‚ฌํ•™์œ„ ๋…ผ๋ฌธ์€ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž์˜ ์‚ฌ์ด์˜ ์ผ๋Œ€๋‹ค, ๋‹ค๋Œ€์ผ, ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ƒ๋ฌผํ•™์  ์ง€์‹์„ ํ™œ์šฉํ•œ ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋Š” ๋ถ„์ž ์ƒ๋ฌผํ•™ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์œ ์ „์ž ์กฐ์ ˆ ์ƒํ˜ธ ์ž‘์šฉ์„ ์ดํ•ดํ•จ์œผ๋กœ์จ ์„ธํฌ ๊ธฐ๋Šฅ์— ๋Œ€ํ•œ ์‹ฌ์ธต์ ์ธ ์ดํ•ด๋ฅผ ๋„์™€์ค„ ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.Understanding how cells function or respond to external stimuli is one of the most important questions in biology and medicine. Thanks to the advances in instrumental technologies, scientists can routinely measure events within cells in single biological experiments. Notable examples are multi-omics data: sequencing of genomes, quantifications of gene expression, and identification of epigenetic events that regulate expression of genes. In order to better understand cellular mechanisms, it is essential to identify regulatory relationships between multi-omics regulators and genes. However, regulatory relationships are very complex and it is infeasible to validate all condition-specific relationships experimentally. Thus, there is an urgent need for an efficient computational method to extract relationships from different types of high-dimensional omics data. One way to address these high-dimensional data is to incorporate external biological knowledge such as relationships between omics and functions of genes curated in various databases. In my doctoral study, I developed three computational approaches to identify the regulatory relationships from multi-omics data utilizing biological prior knowledge. The first study proposes a method to predict one-to-m relationships between miRNA and genes. The computational challenge of miRNA target prediction is that there are many miRNA target candidates, and the ratio of false positives to false negatives needs to be adjusted. This challenge is addressed by utilizing literature knowledge for determining the association between miRNA-gene and a given context. In this study, I developed ContextMMIA to predict miRNA-gene relationships from miRNA and gene expression data. ContextMMIA computes scores of miRNA-gene relationships based on statistical significance and literature relevance and prioritizes the relationships based on the scores. In experiments on breast cancer data with different prognosis, ContextMMIA predicted differentially activated miRNA-gene relationships in invasive breast cancer. The experimentally verified miRNA-gene relationships were predicted with high priority and those genes are known to be involved in breast cancer-related pathways. The second study proposes a method to predict n-to-one relationships between regulators and gene on drug response. The computational challenge of drug response prediction is how to integrate multi-omics data of 20,000 genes for determining drug response mediator genes. This challenge is addressed by utilizing low-dimensional embedding methods, literature knowledge of drug-gene associations, and gene-gene interaction knowledge. For this problem, I developed DRIM to predict drug response relationships from the multi-omics data and drug-induced time-series gene expression data. DRIM uses autoencoder, tensor decomposition, and drug-gene association to determine n-to-one relationships from multi-omics data. Then, regulatory relationships of mediator genes are determined by gene-gene interaction knowledge and cross-correlation of drug-induced time-series gene expression data. In experiments on breast cancer cell line data, DRIM extracted mediator genes relevant to drug response and regulatory relationships of genes involved in the PI3K-Akt pathway targeted by lapatinib. In addition, DRIM revealed distinguished patterns of relationships in breast cancer cell lines with different lapatinib resistance. The third study proposes a method to predict n-to-m relationships between regulators and genes. In order to predict n-to-m relationships, this study formulated an objective function that measures the deviation between observed gene expression values and estimated gene expression values derived from gene regulatory networks. The computational challenge of minimizing the objective function is to navigate the search space of relationships exponentially increasing according to the number of regulators and genes. This challenge is addressed by the iterative local optimization with regulator-gene interaction knowledge. In this study, I developed a two-step iterative RL-based method to predict n-to-m relationships from regulator and gene expression data. The first step is to explore the n-to-one gene-oriented step that selects regulators by reinforcement learning based heuristic to add edges to the network. The second step is to explore the one-to-m regulator-oriented step that stochastically selects genes to remove edges from the network. In experiments on breast cancer cell line data, the proposed method constructed breast cancer subtype-specific networks from the regulator and gene expression profiles with a more accurate gene expression estimation than previous combinatorial optimization methods. Moreover, regulatory relationships involved in the networks were associated with breast cancer subtypes. In summary, in this thesis, I proposed computational methods for predicting one-to-m, n-to-one, and n-to-m relationships between multi-omics regulators and genes utilizing external domain knowledge. The proposed methods are expected to deepen our knowledge of cellular mechanisms by understanding gene regulatory interactions by analyzing the ever-increasing molecular biology data such as The Cancer Genome Atlas, Cancer Cell Line Encyclopedia.Chapter 1 Introduction 1 1.1 Biological background 1 1.1.1 Multi-omics analysis 1 1.1.2 Multi-omics relationships indicating cell state 2 1.1.3 Biological prior knowledge 4 1.2 Research problems for the multi-omics relationship 6 1.3 Computational challenges and approaches in the exploring multiomics relationship 6 1.4 Outline of the thesis 12 Chapter 2 Literature-based condition-specific miRNA-mRNA target prediction 13 2.1 Computational Problem & Evaluation criterion 14 2.2 Related works 15 2.3 Motivation 17 2.4 Methods 20 2.4.1 Identifying genes and miRNAs based on the user-provided context 22 2.4.2 Omics Score 23 2.4.3 Context Score 24 2.4.4 Confidence Score 26 2.5 Results 26 2.5.1 Pathway analysis 27 2.5.2 Reproducibility of validated targets in humans 31 2.5.3 Sensitivity tests when different keywords are used 33 2.6 Summary 34 Chapter 3 DRIM: A web-based system for investigating drug response at the molecular level by condition-specific multi-omics data integration 36 3.1 Computational Problem & Evaluation criterion 37 3.2 Related works 38 3.3 Motivation 42 3.4 Methods 44 3.4.1 Step 1: Input 45 3.4.2 Step 2: Identifying perturbed sub-pathway with time-series 45 3.4.3 Step 3: Embedding multi-omics for selecting potential mediator genes 47 3.4.4 Step 4: Construct TF-regulatory time-bounded network and identify regulatory path 52 3.4.5 Step 5: Analysis result on the web 52 3.5 Case study: Comparative analysis of breast cancer cell lines that have different sensitivity with lapatinib 54 3.5.1 Multi-omics analysis result before drug treatment 56 3.5.2 Time-series gene expression analysis after drug treatment 57 3.6 Summary 61 Chapter 4 Combinatorial modeling and optimization using iterative RL search for inferring sample-specific regulatory network 63 4.1 Computational Problem & Evaluation criterion 64 4.2 Related works 64 4.3 Motivation 66 4.4 Methods 68 4.4.1 Formulating an objective function 68 4.4.2 Overview of an iterative search method 70 4.4.3 G-step for exploring n-to-one gene-oriented relationship 73 4.4.4 R-step for exploring one-to-m regulator-oriented relationship 79 4.5 Results 80 4.5.1 Cancer cell line data 80 4.5.2 Hyperparameters 81 4.5.3 Quantitative evaluation 82 4.5.4 Qualitative evaluation 83 4.6 Summary 86 Chapter 5 Conclusions 88 ๊ตญ๋ฌธ์ดˆ๋ก 111๋ฐ•

    Plato's Cave Algorithm: Inferring Functional Signaling Networks from Early Gene Expression Shadows

    Get PDF
    Improving the ability to reverse engineer biochemical networks is a major goal of systems biology. Lesions in signaling networks lead to alterations in gene expression, which in principle should allow network reconstruction. However, the information about the activity levels of signaling proteins conveyed in overall gene expression is limited by the complexity of gene expression dynamics and of regulatory network topology. Two observations provide the basis for overcoming this limitation: a. genes induced without de-novo protein synthesis (early genes) show a linear accumulation of product in the first hour after the change in the cell's state; b. The signaling components in the network largely function in the linear range of their stimulus-response curves. Therefore, unlike most genes or most time points, expression profiles of early genes at an early time point provide direct biochemical assays that represent the activity levels of upstream signaling components. Such expression data provide the basis for an efficient algorithm (Plato's Cave algorithm; PLACA) to reverse engineer functional signaling networks. Unlike conventional reverse engineering algorithms that use steady state values, PLACA uses stimulated early gene expression measurements associated with systematic perturbations of signaling components, without measuring the signaling components themselves. Besides the reverse engineered network, PLACA also identifies the genes detecting the functional interaction, thereby facilitating validation of the predicted functional network. Using simulated datasets, the algorithm is shown to be robust to experimental noise. Using experimental data obtained from gonadotropes, PLACA reverse engineered the interaction network of six perturbed signaling components. The network recapitulated many known interactions and identified novel functional interactions that were validated by further experiment. PLACA uses the results of experiments that are feasible for any signaling network to predict the functional topology of the network and to identify novel relationships

    Integrative methods for reconstruction of dynamic networks in chondrogenesis

    Get PDF
    Application of human mesenchymal stem cells represents a promising approach in the field of regenerative medicine. Specific stimulation can give rise to chondrocytes, osteocytes or adipocytes. Investigation of the underlying biological processes which induce the observed cellular differentiation is essential to efficiently generate specific tissues for therapeutic purposes. Upon treatment with diverse stimuli, gene expression levels of cultivated human mesenchymal stem cells were monitored using time series microarray experiments for the three lineages. Application of gene network inference is a common approach to identify the regulatory dependencies among a set of investigated genes. This thesis applies the NetGenerator V2.0 tool, which is capable to deal with multiple time series data, which investigates the effect of multiple external stimuli. The applied model is based on a system of linear ordinary differential equations, whose parameters are optimised to reproduce the given time series datasets. Several procedures in the inference process were adapted in this new version in order to allow for the integration of multiple datasets. Network inference was applied on in silico network examples as well as on multi-experiment microarray data of mesenchymal stem cells. The resulting chondrogenesis model was evaluated on the basis of several features including the model adaptation to the data, total number of connections, proportion of connections associated with prior knowledge and the model stability in a resampling procedure. Altogether, NetGenerator V2.0 has provided an automatic and efficient way to integrate experimental datasets and to enhance the interpretability and reliability of the resulting network. In a second chondrogenesis model, the miRNA and mRNA time series data were integrated for the purpose of network inference. One hypothesis of the model was verified by experiments, which demonstrated the negative effect of miR-524-5p on downstream genes

    Integrative microRNA and mRNA deep-sequencing expression profiling in endemic Burkitt lymphoma

    Get PDF
    BACKGROUND: Burkitt lymphoma (BL) is characterized by overexpression of the c-myc oncogene, which in the vast majority of cases is a consequence of an IGH/MYC translocation. While myc is the seminal event, BL is a complex amalgam of genetic and epigenetic changes causing dysregulation of both coding and non-coding transcripts. Emerging evidence suggest that abnormal modulation of mRNA transcription via miRNAs might be a significant factor in lymphomagenesis. However, the alterations in these miRNAs and their correlations to their putative mRNA targets have not been extensively studied relative to normal germinal center (GC) B cells. METHODS: Using more sensitive and specific transcriptome deep sequencing, we compared previously published small miRNA and long mRNA of a set of GC B cells and eBL tumors. MiRWalk2.0 was used to identify the validated target genes for the deregulated miRNAs, which would be important for understanding the regulatory networks associated with eBL development. RESULTS: We found 211 differentially expressed (DE) genes (79 upregulated and 132 downregulated) and 49 DE miRNAs (22 up-regulated and 27 down-regulated). Gene Set enrichment analysis identified the enrichment of a set of MYC regulated genes. Network propagation-based method and correlated miRNA-mRNA expression analysis identified dysregulated miRNAs, including miR-17~95 cluster members and their target genes, which have diverse oncogenic properties to be critical to eBL lymphomagenesis. Central to all these findings, we observed the downregulation of ATM and NLK genes, which represent important regulators in response to DNA damage in eBL tumor cells. These tumor suppressors were targeted by multiple upregulated miRNAs (miR-19b-3p, miR-26a-5p, miR-30b-5p, miR-92a-5p and miR-27b-3p) which could account for their aberrant expression in eBL. CONCLUSION: Combined loss of p53 induction and function due to miRNA-mediated regulation of ATM and NLK, together with the upregulation of TFAP4, may be a central role for human miRNAs in eBL oncogenesis. This facilitates survival of eBL tumor cells with the IGH/MYC chromosomal translocation and promotes MYC-induced cell cycle progression, initiating eBL lymphomagenesis. This characterization of miRNA-mRNA interactions in eBL relative to GC B cells provides new insights on miRNA-mediated transcript regulation in eBL, which are potentially useful for new improved therapeutic strategies

    Integrative approaches for systematic reconstruction of regulatory circuits in mammals

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Computational and Systems Biology Program, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 141-149).The reconstruction of regulatory networks is one of the most challenging tasks in systems biology. Although some models for inferring regulatory networks can make useful predictions about the wiring and mechanisms of molecular interactions, these approaches are still limited and there is a strong need to develop increasingly universal and accurate approaches for network reconstruction. This problem is particularly challenging in mammals, due to the higher complexity of mammalian regulatory networks and limitations in experimental manipulation. In this thesis, I present three systematic approachs to reconstruct, analyse and refine models of gene regulation. In Chapter 1, I devise a method for deriving an observational model from temporal genomic profiles. I use it to choose targets for perturbation experiments in order to determine a network controlling the responses of mouse primary dendritic cells to stimulation with pathogen components. In Chapter 2, I introduce the algorithm Exigo, for identifying essential interactions in regulatory networks reconstructed from experimental data where regulators have been silenced, using a network reduction strategy. Exigo outperforms previous approaches on simulated data, uncovers the core network structure when applied to real networks derived from perturbation studies in mammals, and improves the performance of network inference methods. Lastly, I introduce in Chapter 3 an approach to learn a module network from multiple highthroughput assays. Analysis of a diffuse large B-cell lymphoma dataset identifies candidate regulator genes, microRNAs and copy number aberrations with biological, and possibly therapeutic, importance.by Ana Paula Santos Botelho Oliveira Leite.Ph.D
    • โ€ฆ
    corecore