135 research outputs found

    Survey of Computational Algorithms for MicroRNA Target Prediction

    Get PDF
    MicroRNAs (miRNAs) are 19 to 25 nucleotides non-coding RNAs known to possess important post-transcriptional regulatory functions. Identifying targeting genes that miRNAs regulate are important for understanding their specific biological functions. Usually, miRNAs down-regulate target genes through binding to the complementary sites in the 3' untranslated region (UTR) of the targets. In part, due to the large number of miRNAs and potential targets, an experimental based prediction design would be extremely laborious and economically unfavorable. However, since the bindings of the animal miRNAs are not a perfect one-to-one match with the complementary sites of their targets, it is difficult to predict targets of animal miRNAs by accessing their alignment to the 3' UTRs of potential targets. Consequently, sophisticated computational approaches for miRNA target prediction are being considered as essential methods in miRNA research

    Activity of microRNAs and transcription factors in Gene Regulatory Networks

    Get PDF
    In biological research, diverse high-throughput techniques enable the investigation of whole systems at the molecular level. The development of new methods and algorithms is necessary to analyze and interpret measurements of gene and protein expression and of interactions between genes and proteins. One of the challenges is the integrated analysis of gene expression and the associated regulation mechanisms. The two most important types of regulators, transcription factors (TFs) and microRNAs (miRNAs), often cooperate in complex networks at the transcriptional and post-transcriptional level and, thus, enable a combinatorial and highly complex regulation of cellular processes. For instance, TFs activate and inhibit the expression of other genes including other TFs whereas miRNAs can post-transcriptionally induce the degradation of transcribed RNA and impair the translation of mRNA into proteins. The identification of gene regulatory networks (GRNs) is mandatory in order to understand the underlying control mechanisms. The expression of regulators is itself regulated, i.e. activating or inhibiting regulators in varying conditions and perturbations. Thus, measurements of gene expression following targeted perturbations (knockouts or overexpressions) of these regulators are of particular importance. The prediction of the activity states of the regulators and the prediction of the target genes are first important steps towards the construction of GRNs. This thesis deals with these first bioinformatics steps to construct GRNs. Targets of TFs and miRNAs are determined as comprehensively and accurately as possible. The activity state of regulators is predicted for specific high-throughput data and specific contexts using appropriate statistical approaches. Moreover, (parts of) GRNs are inferred, which lead to explanations of given measurements. The thesis describes new approaches for these tasks together with accompanying evaluations and validations. This immediately defines the three main goals of the current thesis: 1. The development of a comprehensive database of regulator-target relation. Regulators and targets are retrieved from public repositories, extracted from the literature via text mining and collected into the miRSel database. In addition, relations can be predicted using various published methods. In order to determine the activity states of regulators (see 2.) and to infer GRNs (3.) comprehensive and accurate regulator-target relations are required. It could be shown that text mining enables the reliable extraction of miRNA, gene, and protein names as well as their relations from scientific free texts. Overall, the miRSel contains about three times more relations for the model organisms human, mouse, and rat as compared to state-of-the-art databases (e.g. TarBase, one of the currently most used resources for miRNA-target relations). 2. The prediction of activity states of regulators based on improved target sets. In order to investigate mechanisms of gene regulation, the experimental contexts have to be determined in which the respective regulators become active. A regulator is predicted as active based on appropriate statistical tests applied to the expression values of its set of target genes. For this task various gene set enrichment (GSE) methods have been proposed. Unfortunately, before an actual experiment it is unknown which genes are affected. The missing standard-of-truth so far has prevented the systematic assessment and evaluation of GSE tests. In contrast, the trigger of gene expression changes is of course known for experiments where a particular regulator has been directly perturbed (i.e. by knockout, transfection, or overexpression). Based on such datasets, we have systematically evaluated 12 current GSE tests. In our analysis ANOVA and the Wilcoxon test performed best. 3. The prediction of regulation cascades. Using gene expression measurements and given regulator-target relations (e.g. from the miRSel database) GRNs are derived. GSE tests are applied to determine TFs and miRNAs that change their activity as cellular response to an overexpressed miRNA. Gene regulatory networks can constructed iteratively. Our models show how miRNAs trigger gene expression changes: either directly or indirectly via cascades of miRNA-TF, miRNA-kinase-TF as well as TF-TF relations. In this thesis we focus on measurements which have been obtained after overexpression of miRNAs. Surprisingly, a number of cancer relevant miRNAs influence a common core of TFs which are involved in processes such as proliferation and apoptosis

    ์ƒ๋ฌผํ•™์  ์‚ฌ์ „ ์ง€์‹์„ ํ™œ์šฉํ•œ ๊ณ ์ฐจ์›์˜ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๊ด€๊ณ„๋ฅผ ์ฐพ๋Š” ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. ๊น€์„ .์„ธํฌ๊ฐ€ ์–ด๋–ป๊ฒŒ ๊ธฐ๋Šฅํ•˜๊ณ  ์™ธ๋ถ€ ์ž๊ทน์— ๋ฐ˜์‘ํ•˜๋Š”์ง€ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์€ ์ƒ๋ฌผํ•™, ์˜ํ•™์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ด€์‹ฌ์‚ฌ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์œผ๋กœ ๊ณผํ•™์ž๋“ค์€ ๋‹จ์ผ ์ƒ๋ฌผํ•™์  ์‹คํ—˜์œผ๋กœ ์„ธํฌ์˜ ๋ณ€ํ™”์š”์ธ๋“ค์„ ์‰ฝ๊ฒŒ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. ์ฃผ๋ชฉํ• ๋งŒํ•œ ์˜ˆ์‹œ๋กœ ๊ฒŒ๋†ˆ ์‹œํ€€์‹ฑ, ์œ ์ „์ž ๋ฐœํ˜„๋Ÿ‰ ์ธก์ •, ์œ ์ „์ž ๋ฐœํ˜„์„ ์กฐ์ ˆํ•˜๋Š” ํ›„์„ฑ ์œ ์ „์ฒด ์ธก์ • ๊ฐ™์€ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค. ์„ธํฌ์˜ ์ƒํƒœ๋ฅผ ๋” ์ž์„ธํžˆ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž ์‚ฌ์ด์˜ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์•Œ์•„๋‚ด๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. ํ•˜์ง€๋งŒ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ ๊ด€๊ณ„๋Š” ๋งค์šฐ ๋ณต์žกํ•˜๊ณ  ๋ชจ๋“  ์„ธํฌ ์ƒํƒœ ํŠน์ด์ ์ธ ๊ด€๊ณ„๋ฅผ ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆํ•˜๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๋”ฐ๋ผ์„œ, ์„œ๋กœ ๋‹ค๋ฅธ ์œ ํ˜•์˜ ๊ณ ์ฐจ์› ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ํšจ์œจ์ ์ธ ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ๋ฐฉ๋ฒ•์ด ์š”๊ตฌ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ํ•œ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ์„ ๋ณ„๋œ ์œ ์ „์ž์˜ ๊ธฐ๋Šฅ๊ณผ ์˜ค๋ฏน์Šค ๊ฐ„์˜ ๊ด€๊ณ„์™€ ๊ฐ™์€ ์™ธ๋ถ€ ์ƒ๋ฌผํ•™์  ์ง€์‹์„ ํ†ตํ•ฉํ•˜์—ฌ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๋ณธ ๋ฐ•์‚ฌํ•™์œ„ ๋…ผ๋ฌธ์€ ์ƒ๋ฌผํ•™์  ์‚ฌ์ „ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์œ ์ „์ž์˜ ๋ฐœํ˜„์„ ์กฐ์ ˆํ•˜๋Š” ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ์„ธ ๊ฐ€์ง€ ์ปดํ“จํ„ฐ ๊ณตํ•™์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด์™€ ์œ ์ „์ž์˜ ์ผ๋Œ€๋‹ค ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด ํ‘œ์  ์˜ˆ์ธก ๋ฌธ์ œ๋Š” ๊ฐ€๋Šฅํ•œ ํ‘œ์  ์œ ์ „์ž์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์œผ๋ฉฐ ๊ฑฐ์ง“ ์–‘์„ฑ๊ณผ ๊ฑฐ์ง“์Œ์„ฑ์˜ ๋น„์œจ์„ ์กฐ์ ˆํ•ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž์™€ ๋ฐ์ดํ„ฐ์˜ ๋งฅ๋ฝ ์‚ฌ์ด์˜ ์—ฐ๊ด€์„ฑ์„ ๋ฌธํ—Œ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๊ฒฐ์ •ํ•˜๊ณ  ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ContextMMIA๋ฅผ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ContextMMIA๋Š” ํ†ต๊ณ„์  ์œ ์˜์„ฑ๊ณผ ๋ฌธํ—Œ ๊ด€๋ จ์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„์˜ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๊ด€๊ณ„์˜ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ์˜ˆํ›„๊ฐ€ ๋‹ค๋ฅธ ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ ContextMMIA๋Š” ์˜ˆํ›„๊ฐ€ ๋‚˜์œ ์œ ๋ฐฉ์•”์—์„œ ํ™œ์„ฑํ™”๋œ ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๊ณ  ๊ธฐ์กด ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆ๋œ ๊ด€๊ณ„๊ฐ€ ๋†’์€ ์šฐ์„ ์ˆœ์œ„๋กœ ์˜ˆ์ธก๋˜์—ˆ์œผ๋ฉฐ ํ•ด๋‹น ์œ ์ „์ž๋“ค์ด ์œ ๋ฐฉ์•” ๊ด€๋ จ ๊ฒฝ๋กœ์— ๊ด€์—ฌํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์กŒ๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” ์•ฝ๋ฌผ ๋ฐ˜์‘์„ ์ผ์œผํ‚ค๋Š” ์œ ์ „์ž์˜ ๋‹ค๋Œ€์ผ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ์•ฝ๋ฌผ ๋ฐ˜์‘ ์˜ˆ์ธก์„ ์œ„ํ•ด์„œ ์•ฝ๋ฌผ ๋ฐ˜์‘ ๋งค๊ฐœ ์œ ์ „์ž๋ฅผ ๊ฒฐ์ •ํ•ด์•ผ ํ•˜๋ฉฐ ์ด๋ฅผ ์œ„ํ•ด 20,000๊ฐœ ์œ ์ „์ž์˜ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ฉ ๋ถ„์„ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ €์ฐจ์› ์ž„๋ฒ ๋”ฉ ๋ฐฉ๋ฒ•, ์•ฝ๋ฌผ-์œ ์ „์ž ์—ฐ๊ด€์„ฑ์— ๋Œ€ํ•œ ๋ฌธํ—Œ ์ง€์‹ ๋ฐ ์œ ์ „์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ์•ฝ๋ฌผ ๋ฐ˜์‘์„ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ DRIM์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. DRIM์€ ์˜คํ† ์ธ์ฝ”๋”, ํ…์„œ ๋ถ„ํ•ด, ์•ฝ๋ฌผ-์œ ์ „์ž ์—ฐ๊ด€์„ฑ์„ ์ด์šฉํ•˜์—ฌ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ์—์„œ ๋‹ค๋Œ€์ผ ๊ด€๊ณ„๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ๊ฒฐ์ •๋œ ๋งค๊ฐœ ์œ ์ „์ž์˜ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์œ ์ „์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹๊ณผ ์•ฝ๋ฌผ ๋ฐ˜์‘ ์‹œ๊ณ„์—ด ์œ ์ „์ž ๋ฐœํ˜„ ๋ฐ์ดํ„ฐ์˜ ์ƒํ˜ธ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฒฐ์ •ํ•œ๋‹ค. ์œ ๋ฐฉ์•” ์„ธํฌ์ฃผ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ DRIM์€ ๋ผํŒŒํ‹ฐ๋‹™์ด ํ‘œ์ ์œผ๋กœ ํ•˜๋Š” PI3K-Akt ํŒจ์Šค์›จ์ด์— ๊ด€์—ฌํ•˜๋Š” ์œ ์ „์ž๋“ค์˜ ์•ฝ๋ฌผ ๋ฐ˜์‘ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๊ณ  ๋ผํŒŒํ‹ฐ๋‹™ ๋ฐ˜์‘์„ฑ๊ณผ ๊ด€๋ จ๋œ ๋งค๊ฐœ ์œ ์ „์ž๋ฅผ ์˜ˆ์ธกํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์˜ˆ์ธก๋œ ์กฐ์ ˆ ๊ด€๊ณ„๊ฐ€ ์„ธํฌ์ฃผ ํŠน์ด์ ์ธ ํŒจํ„ด์„ ๋ณด์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์„ธ ๋ฒˆ์งธ๋Š” ์„ธํฌ์˜ ์ƒํƒœ๋ฅผ ์„ค๋ช…ํ•˜๋Š” ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž์˜ ๋‹ค๋Œ€๋‹ค ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„ ์˜ˆ์ธก์„ ์œ„ํ•ด ๊ด€์ฐฐ๋œ ์œ ์ „์ž ๋ฐœํ˜„ ๊ฐ’๊ณผ ์œ ์ „์ž ์กฐ์ ˆ ๋„คํŠธ์›Œํฌ๋กœ๋ถ€ํ„ฐ ์ถ”์ •๋œ ์œ ์ „์ž ๋ฐœํ˜„ ๊ฐ’ ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค. ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์กฐ์ ˆ์ธ์ž์™€ ์œ ์ „์ž์˜ ์ˆ˜์— ๋”ฐ๋ผ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒ€์ƒ‰ ๊ณต๊ฐ„์„ ํƒ์ƒ‰ํ•ด์•ผ ํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์กฐ์ ˆ์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ์—ฐ์‚ฐ์„ ๋ฐ˜๋ณตํ•˜์—ฌ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์ฐพ๋Š” ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋„คํŠธ์›Œํฌ์— ๊ฐ„์„ ์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ•ํ™” ํ•™์Šต ๊ธฐ๋ฐ˜ ํœด๋ฆฌ์Šคํ‹ฑ์„ ํ†ตํ•ด ์กฐ์ ˆ์ž๋ฅผ ์„ ํƒํ•˜๋Š” ๋‹ค๋Œ€์ผ ์œ ์ „์ž ์ค‘์‹ฌ ๊ด€๊ณ„๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค. ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋„คํŠธ์›Œํฌ์—์„œ ๊ฐ„์„ ์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด ์œ ์ „์ž๋ฅผ ํ™•๋ฅ ์ ์œผ๋กœ ์„ ํƒํ•˜๋Š” ์ผ๋Œ€๋‹ค ์กฐ์ ˆ์ž ์ค‘์‹ฌ ๊ด€๊ณ„๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค. ์œ ๋ฐฉ์•” ์„ธํฌ์ฃผ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ด์ „์˜ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ์ •ํ™•ํ•œ ์œ ์ „์ž ๋ฐœํ˜„๋Ÿ‰ ์ถ”์ •์„ ํ•˜์˜€๊ณ  ์กฐ์ ˆ์ž ๋ฐ ์œ ์ „์ž ๋ฐœํ˜„ ๋ฐ์ดํ„ฐ๋กœ ์œ ๋ฐฉ์•” ์•„ํ˜• ํŠน์ด์  ๋„คํŠธ์›Œํฌ๋ฅผ ๊ตฌ์„ฑํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์œ ๋ฐฉ์•” ์•„ํ˜• ๊ด€๋ จ ์‹คํ—˜ ๊ฒ€์ฆ๋œ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๋‹ค. ์š”์•ฝํ•˜๋ฉด, ๋ณธ ๋ฐ•์‚ฌํ•™์œ„ ๋…ผ๋ฌธ์€ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž์˜ ์‚ฌ์ด์˜ ์ผ๋Œ€๋‹ค, ๋‹ค๋Œ€์ผ, ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ƒ๋ฌผํ•™์  ์ง€์‹์„ ํ™œ์šฉํ•œ ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋Š” ๋ถ„์ž ์ƒ๋ฌผํ•™ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์œ ์ „์ž ์กฐ์ ˆ ์ƒํ˜ธ ์ž‘์šฉ์„ ์ดํ•ดํ•จ์œผ๋กœ์จ ์„ธํฌ ๊ธฐ๋Šฅ์— ๋Œ€ํ•œ ์‹ฌ์ธต์ ์ธ ์ดํ•ด๋ฅผ ๋„์™€์ค„ ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.Understanding how cells function or respond to external stimuli is one of the most important questions in biology and medicine. Thanks to the advances in instrumental technologies, scientists can routinely measure events within cells in single biological experiments. Notable examples are multi-omics data: sequencing of genomes, quantifications of gene expression, and identification of epigenetic events that regulate expression of genes. In order to better understand cellular mechanisms, it is essential to identify regulatory relationships between multi-omics regulators and genes. However, regulatory relationships are very complex and it is infeasible to validate all condition-specific relationships experimentally. Thus, there is an urgent need for an efficient computational method to extract relationships from different types of high-dimensional omics data. One way to address these high-dimensional data is to incorporate external biological knowledge such as relationships between omics and functions of genes curated in various databases. In my doctoral study, I developed three computational approaches to identify the regulatory relationships from multi-omics data utilizing biological prior knowledge. The first study proposes a method to predict one-to-m relationships between miRNA and genes. The computational challenge of miRNA target prediction is that there are many miRNA target candidates, and the ratio of false positives to false negatives needs to be adjusted. This challenge is addressed by utilizing literature knowledge for determining the association between miRNA-gene and a given context. In this study, I developed ContextMMIA to predict miRNA-gene relationships from miRNA and gene expression data. ContextMMIA computes scores of miRNA-gene relationships based on statistical significance and literature relevance and prioritizes the relationships based on the scores. In experiments on breast cancer data with different prognosis, ContextMMIA predicted differentially activated miRNA-gene relationships in invasive breast cancer. The experimentally verified miRNA-gene relationships were predicted with high priority and those genes are known to be involved in breast cancer-related pathways. The second study proposes a method to predict n-to-one relationships between regulators and gene on drug response. The computational challenge of drug response prediction is how to integrate multi-omics data of 20,000 genes for determining drug response mediator genes. This challenge is addressed by utilizing low-dimensional embedding methods, literature knowledge of drug-gene associations, and gene-gene interaction knowledge. For this problem, I developed DRIM to predict drug response relationships from the multi-omics data and drug-induced time-series gene expression data. DRIM uses autoencoder, tensor decomposition, and drug-gene association to determine n-to-one relationships from multi-omics data. Then, regulatory relationships of mediator genes are determined by gene-gene interaction knowledge and cross-correlation of drug-induced time-series gene expression data. In experiments on breast cancer cell line data, DRIM extracted mediator genes relevant to drug response and regulatory relationships of genes involved in the PI3K-Akt pathway targeted by lapatinib. In addition, DRIM revealed distinguished patterns of relationships in breast cancer cell lines with different lapatinib resistance. The third study proposes a method to predict n-to-m relationships between regulators and genes. In order to predict n-to-m relationships, this study formulated an objective function that measures the deviation between observed gene expression values and estimated gene expression values derived from gene regulatory networks. The computational challenge of minimizing the objective function is to navigate the search space of relationships exponentially increasing according to the number of regulators and genes. This challenge is addressed by the iterative local optimization with regulator-gene interaction knowledge. In this study, I developed a two-step iterative RL-based method to predict n-to-m relationships from regulator and gene expression data. The first step is to explore the n-to-one gene-oriented step that selects regulators by reinforcement learning based heuristic to add edges to the network. The second step is to explore the one-to-m regulator-oriented step that stochastically selects genes to remove edges from the network. In experiments on breast cancer cell line data, the proposed method constructed breast cancer subtype-specific networks from the regulator and gene expression profiles with a more accurate gene expression estimation than previous combinatorial optimization methods. Moreover, regulatory relationships involved in the networks were associated with breast cancer subtypes. In summary, in this thesis, I proposed computational methods for predicting one-to-m, n-to-one, and n-to-m relationships between multi-omics regulators and genes utilizing external domain knowledge. The proposed methods are expected to deepen our knowledge of cellular mechanisms by understanding gene regulatory interactions by analyzing the ever-increasing molecular biology data such as The Cancer Genome Atlas, Cancer Cell Line Encyclopedia.Chapter 1 Introduction 1 1.1 Biological background 1 1.1.1 Multi-omics analysis 1 1.1.2 Multi-omics relationships indicating cell state 2 1.1.3 Biological prior knowledge 4 1.2 Research problems for the multi-omics relationship 6 1.3 Computational challenges and approaches in the exploring multiomics relationship 6 1.4 Outline of the thesis 12 Chapter 2 Literature-based condition-specific miRNA-mRNA target prediction 13 2.1 Computational Problem & Evaluation criterion 14 2.2 Related works 15 2.3 Motivation 17 2.4 Methods 20 2.4.1 Identifying genes and miRNAs based on the user-provided context 22 2.4.2 Omics Score 23 2.4.3 Context Score 24 2.4.4 Confidence Score 26 2.5 Results 26 2.5.1 Pathway analysis 27 2.5.2 Reproducibility of validated targets in humans 31 2.5.3 Sensitivity tests when different keywords are used 33 2.6 Summary 34 Chapter 3 DRIM: A web-based system for investigating drug response at the molecular level by condition-specific multi-omics data integration 36 3.1 Computational Problem & Evaluation criterion 37 3.2 Related works 38 3.3 Motivation 42 3.4 Methods 44 3.4.1 Step 1: Input 45 3.4.2 Step 2: Identifying perturbed sub-pathway with time-series 45 3.4.3 Step 3: Embedding multi-omics for selecting potential mediator genes 47 3.4.4 Step 4: Construct TF-regulatory time-bounded network and identify regulatory path 52 3.4.5 Step 5: Analysis result on the web 52 3.5 Case study: Comparative analysis of breast cancer cell lines that have different sensitivity with lapatinib 54 3.5.1 Multi-omics analysis result before drug treatment 56 3.5.2 Time-series gene expression analysis after drug treatment 57 3.6 Summary 61 Chapter 4 Combinatorial modeling and optimization using iterative RL search for inferring sample-specific regulatory network 63 4.1 Computational Problem & Evaluation criterion 64 4.2 Related works 64 4.3 Motivation 66 4.4 Methods 68 4.4.1 Formulating an objective function 68 4.4.2 Overview of an iterative search method 70 4.4.3 G-step for exploring n-to-one gene-oriented relationship 73 4.4.4 R-step for exploring one-to-m regulator-oriented relationship 79 4.5 Results 80 4.5.1 Cancer cell line data 80 4.5.2 Hyperparameters 81 4.5.3 Quantitative evaluation 82 4.5.4 Qualitative evaluation 83 4.6 Summary 86 Chapter 5 Conclusions 88 ๊ตญ๋ฌธ์ดˆ๋ก 111๋ฐ•

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Computational Methods for the Pharmacogenetic Interpretation of Next Generation Sequencing Data

    Get PDF
    Up to half of all patients do not respond to pharmacological treatment as intended. A substantial fraction of these inter-individual differences is due to heritable factors and a growing number of associations between genetic variations and drug response phenotypes have been identified. Importantly, the rapid progress in Next Generation Sequencing technologies in recent years unveiled the true complexity of the genetic landscape in pharmacogenes with tens of thousands of rare genetic variants. As each individual was found to harbor numerous such rare variants they are anticipated to be important contributors to the genetically encoded inter-individual variability in drug effects. The fundamental challenge however is their functional interpretation due to the sheer scale of the problem that renders systematic experimental characterization of these variants currently unfeasible. Here, we review concepts and important progress in the development of computational prediction methods that allow to evaluate the effect of amino acid sequence alterations in drug metabolizing enzymes and transporters. In addition, we discuss recent advances in the interpretation of functional effects of non-coding variants, such as variations in splice sites, regulatory regions and miRNA binding sites. We anticipate that these methodologies will provide a useful toolkit to facilitate the integration of the vast extent of rare genetic variability into drug response predictions in a precision medicine framework

    Functional characterization and annotation of trait-associated genomic regions by transcriptome analysis

    Get PDF
    In this work, two novel implementations have been presented, which could assist in the design and data analysis of high-throughput genomic experiments. An efficient and flexible tiling probe selection pipeline utilizing the penalized uniqueness score has been implemented, which could be employed in the design of various types and scales of genome tiling task. A novel hidden semi-Markov model (HSMM) implementation is made available within the Bioconductor project, which provides a unified interface for segmenting genomic data in a wide range of research subjects.In dieser Arbeit werden zwei neuartige Implementierungen prรคsentiert, die im Design und in der Datenanalyse von genomischen Hochdurchsatz-Experiment hilfreich sein kรถnnten. Die erste Implementierung bildet eine effiziente und flexible Auswahl-Pipeline fรผr Tiling-Proben, basierend auf einem EindeutigkeitsmaรŸ mit einer Maluswertung. Als zweite Implementierung wurde ein neuartiges Hidden-Semi-Markov-Modell (HSMM) im Bioconductor Projekt verfรผgbar gemacht

    Discriminative Learning for Probabilistic Sequence Analysis

    No full text

    Susceptibility of microRNAs 145, 143 and 133b to epigenetic regulation in colorectal cancer cell lines; prediction and functional analysis of putative targets to associated microRNAs

    Get PDF
    A dissertation submitted to the Faculty of Health Sciences, University of Witwatersrand, Johannesburg in fulfilment of the requirements for the degree of Master of Science in Medicine Department of Internal Medicine, University of Witwatersrand, South Africa Johannesburg, 2016Colorectal cancer (CRC) is a significant health burden maintaining its position as the third most diagnosed cancer in men and women worldwide. Despite improvements in treatments for CRC, mortality rates still remain high. Genetic instability and epigenetic deregulation of gene expression are instigators of CRC development, resulting in genotype differences which herald treatment response variability and unpredictability. Over the past decade and a half, microRNAs (miRNA) have emerged as key contributors to the perturbed proteome in cancer cells, including CRC. MiRNAs are small non-coding RNA molecules (consisting of approximately 22 nucleotides) targeted to specific mRNAs through various target recognition mechanisms to repress protein translation or to induce mRNA degradation. Three miRNAs, miR-143, -145 and -133b, are most commonly downregulated in CRC and have been proposed as potential tumour suppressors. Although downregulation of these miRNAs in CRC is to a large extent unexplained, epigenetic silencing has been postulated as a causative regulatory mechanism. Potential epigenetic modulation of miRNA expression, by means of histone acetylation and DNA methylation, was assessed in this study by treating early (SW1116) and late stage (DLD1) CRC cells with the DNA demethylating agent, 5-aza-2โ€™-deoxycytidine (5-Aza-2โ€™C) and the histone deacetylase (HDAC) inhibitor, Trichostatin A (TSA), respectively. Subsequently quantifying miRNA expression, using miRNA TaqManยฎ PCR assays for each of miR-143, -145 and -133b, revealed that while all of these miRNAs are susceptible to DNA demethylation in early and late stage CRC cells, the susceptibility to DNA demethylation is significantly pronounced in the late stage DLD1 cells. Conversely, histone acetylation moderately affected miRNA expression in early stage CRC, but with a marginal effect on the expression of miRNAs in late stage CRC cells. These associations have been argued to correlate with genotypic differences between the microsatellite stable (MSS) SW1116 cell line and the microsatellite instability (MSI) of the DLD1 cells. To further evaluate the role that these miRNAs play in CRC development, this study utilised in silico miRNA target prediction tools to identify potential miRNA gene target lists. Once generated, these were strategically curated and filtered to allow for the election of suitable candidates for functional analysis. This approach yielded three candidates, KRAS, FZD7 and FBXW11/รŸ-TrCP as the most probable targets for miR-143, -145 and -133b, respectively, further supported by their inverse correlations to the associated miRNA expression in CRC. Proteomic expression of the predicted targets assessed pre- and post- transfection of HET-1A cells with anti-miRโ„ข sequences of the associated miRNA revealed elevated protein expression with differential subcellular protein localization upon miRNA inhibition. Overall this study has provided further understanding of the contribution of epigenetics in regulation of putative tumour suppressor miRNAs in CRC. Additionally, KRAS targeting by miR-143 has been reaffirmed, while FZD7 and FBXW11/รŸ-TrCP expression analysis after anti-miR-145 and anti-miR-133b transfection, respectively, provides substantial evidence for their role as potential direct miRNA targets.MT201
    • โ€ฆ
    corecore