2,490 research outputs found

    Combining Bayesian Approaches and Evolutionary Techniques for the Inference of Breast Cancer Networks

    Get PDF
    Gene and protein networks are very important to model complex large-scale systems in molecular biology. Inferring or reverseengineering such networks can be defined as the process of identifying gene/protein interactions from experimental data through computational analysis. However, this task is typically complicated by the enormously large scale of the unknowns in a rather small sample size. Furthermore, when the goal is to study causal relationships within the network, tools capable of overcoming the limitations of correlation networks are required. In this work, we make use of Bayesian Graphical Models to attach this problem and, specifically, we perform a comparative study of different state-of-the-art heuristics, analyzing their performance in inferring the structure of the Bayesian Network from breast cancer data

    An integrative, multi-scale, genome-wide model reveals the phenotypic landscape of Escherichia coli.

    Get PDF
    Given the vast behavioral repertoire and biological complexity of even the simplest organisms, accurately predicting phenotypes in novel environments and unveiling their biological organization is a challenging endeavor. Here, we present an integrative modeling methodology that unifies under a common framework the various biological processes and their interactions across multiple layers. We trained this methodology on an extensive normalized compendium for the gram-negative bacterium Escherichia coli, which incorporates gene expression data for genetic and environmental perturbations, transcriptional regulation, signal transduction, and metabolic pathways, as well as growth measurements. Comparison with measured growth and high-throughput data demonstrates the enhanced ability of the integrative model to predict phenotypic outcomes in various environmental and genetic conditions, even in cases where their underlying functions are under-represented in the training set. This work paves the way toward integrative techniques that extract knowledge from a variety of biological data to achieve more than the sum of their parts in the context of prediction, analysis, and redesign of biological systems

    Data integration for microarrays: enhanced inference for gene regulatory networks

    Get PDF
    Microarray technologies have been the basis of numerous important findings regarding gene expression in the last decades. Studies have generated large amounts of data describing various processes, which, due to the existence of public databases, are widely available for further analysis. Given their lower cost and higher maturity compared to newer sequencing technologies, these data continue to be produced, even though data quality has been the subject of some debate. However, given the large volume of data generated, integration can help overcome some issues related e.g. to noise or reduced time resolution, while providing additional insight on features not directly addressed by sequencing methods. Here we present an integration test case based on public Drosophila melanogaster datasets (gene expression, binding site affinities, known interactions). Using an evolutionary computation framework, we show how integration can enhance the ability to recover transcriptional gene regulatory networks from these data, as well as indicating which data types are more important for quantitative and qualitative network inference. Our results show a clear improvement in performance when multiple data sets are integrated, indicating that microarray data will remain a valuable and viable resource for some time to come

    Machine Learning for Uncovering Biological Insights in Spatial Transcriptomics Data

    Full text link
    Development and homeostasis in multicellular systems both require exquisite control over spatial molecular pattern formation and maintenance. Advances in spatially-resolved and high-throughput molecular imaging methods such as multiplexed immunofluorescence and spatial transcriptomics (ST) provide exciting new opportunities to augment our fundamental understanding of these processes in health and disease. The large and complex datasets resulting from these techniques, particularly ST, have led to rapid development of innovative machine learning (ML) tools primarily based on deep learning techniques. These ML tools are now increasingly featured in integrated experimental and computational workflows to disentangle signals from noise in complex biological systems. However, it can be difficult to understand and balance the different implicit assumptions and methodologies of a rapidly expanding toolbox of analytical tools in ST. To address this, we summarize major ST analysis goals that ML can help address and current analysis trends. We also describe four major data science concepts and related heuristics that can help guide practitioners in their choices of the right tools for the right biological questions

    Can biological complexity be reverse engineered?

    Get PDF
    Concerns with the use of engineering approaches in biology have recently been raised. I examine two related challenges to biological research that I call the synchronic and diachronic underdetermination problem. The former refers to challenges associated with the inference of design principles underlying system capacities when the synchronic relations between lower-level processes and higher-level systems capacities are degenerate (many-to-many). The diachronic underdetermination problem regards the problem of reverse engineering a system where the non-linear relations between system capacities and lower-level mechanisms are changing over time. Braun and Marom argue that recent insights to biological complexity leave the aim of reverse engineering hopeless - in principle as well as in practice. While I support their call for systemic approaches to capture the dynamic nature of living systems, I take issue with the conflation of reverse engineering with naรฏve reductionism. I clarify how the notion of design principles can be more broadly conceived and argue that reverse engineering is compatible with a dynamic view of organisms. It may even help to facilitate an integrated account that bridges the gap between mechanistic and systems approaches

    ์ƒ๋ฌผํ•™์  ์‚ฌ์ „ ์ง€์‹์„ ํ™œ์šฉํ•œ ๊ณ ์ฐจ์›์˜ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๊ด€๊ณ„๋ฅผ ์ฐพ๋Š” ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. ๊น€์„ .์„ธํฌ๊ฐ€ ์–ด๋–ป๊ฒŒ ๊ธฐ๋Šฅํ•˜๊ณ  ์™ธ๋ถ€ ์ž๊ทน์— ๋ฐ˜์‘ํ•˜๋Š”์ง€ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์€ ์ƒ๋ฌผํ•™, ์˜ํ•™์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ด€์‹ฌ์‚ฌ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์œผ๋กœ ๊ณผํ•™์ž๋“ค์€ ๋‹จ์ผ ์ƒ๋ฌผํ•™์  ์‹คํ—˜์œผ๋กœ ์„ธํฌ์˜ ๋ณ€ํ™”์š”์ธ๋“ค์„ ์‰ฝ๊ฒŒ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. ์ฃผ๋ชฉํ• ๋งŒํ•œ ์˜ˆ์‹œ๋กœ ๊ฒŒ๋†ˆ ์‹œํ€€์‹ฑ, ์œ ์ „์ž ๋ฐœํ˜„๋Ÿ‰ ์ธก์ •, ์œ ์ „์ž ๋ฐœํ˜„์„ ์กฐ์ ˆํ•˜๋Š” ํ›„์„ฑ ์œ ์ „์ฒด ์ธก์ • ๊ฐ™์€ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค. ์„ธํฌ์˜ ์ƒํƒœ๋ฅผ ๋” ์ž์„ธํžˆ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž ์‚ฌ์ด์˜ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์•Œ์•„๋‚ด๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. ํ•˜์ง€๋งŒ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ ๊ด€๊ณ„๋Š” ๋งค์šฐ ๋ณต์žกํ•˜๊ณ  ๋ชจ๋“  ์„ธํฌ ์ƒํƒœ ํŠน์ด์ ์ธ ๊ด€๊ณ„๋ฅผ ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆํ•˜๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๋”ฐ๋ผ์„œ, ์„œ๋กœ ๋‹ค๋ฅธ ์œ ํ˜•์˜ ๊ณ ์ฐจ์› ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ํšจ์œจ์ ์ธ ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ๋ฐฉ๋ฒ•์ด ์š”๊ตฌ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ํ•œ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ์„ ๋ณ„๋œ ์œ ์ „์ž์˜ ๊ธฐ๋Šฅ๊ณผ ์˜ค๋ฏน์Šค ๊ฐ„์˜ ๊ด€๊ณ„์™€ ๊ฐ™์€ ์™ธ๋ถ€ ์ƒ๋ฌผํ•™์  ์ง€์‹์„ ํ†ตํ•ฉํ•˜์—ฌ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๋ณธ ๋ฐ•์‚ฌํ•™์œ„ ๋…ผ๋ฌธ์€ ์ƒ๋ฌผํ•™์  ์‚ฌ์ „ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์œ ์ „์ž์˜ ๋ฐœํ˜„์„ ์กฐ์ ˆํ•˜๋Š” ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ์„ธ ๊ฐ€์ง€ ์ปดํ“จํ„ฐ ๊ณตํ•™์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด์™€ ์œ ์ „์ž์˜ ์ผ๋Œ€๋‹ค ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด ํ‘œ์  ์˜ˆ์ธก ๋ฌธ์ œ๋Š” ๊ฐ€๋Šฅํ•œ ํ‘œ์  ์œ ์ „์ž์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์œผ๋ฉฐ ๊ฑฐ์ง“ ์–‘์„ฑ๊ณผ ๊ฑฐ์ง“์Œ์„ฑ์˜ ๋น„์œจ์„ ์กฐ์ ˆํ•ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž์™€ ๋ฐ์ดํ„ฐ์˜ ๋งฅ๋ฝ ์‚ฌ์ด์˜ ์—ฐ๊ด€์„ฑ์„ ๋ฌธํ—Œ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๊ฒฐ์ •ํ•˜๊ณ  ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ContextMMIA๋ฅผ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ContextMMIA๋Š” ํ†ต๊ณ„์  ์œ ์˜์„ฑ๊ณผ ๋ฌธํ—Œ ๊ด€๋ จ์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„์˜ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๊ด€๊ณ„์˜ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ์˜ˆํ›„๊ฐ€ ๋‹ค๋ฅธ ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ ContextMMIA๋Š” ์˜ˆํ›„๊ฐ€ ๋‚˜์œ ์œ ๋ฐฉ์•”์—์„œ ํ™œ์„ฑํ™”๋œ ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๊ณ  ๊ธฐ์กด ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆ๋œ ๊ด€๊ณ„๊ฐ€ ๋†’์€ ์šฐ์„ ์ˆœ์œ„๋กœ ์˜ˆ์ธก๋˜์—ˆ์œผ๋ฉฐ ํ•ด๋‹น ์œ ์ „์ž๋“ค์ด ์œ ๋ฐฉ์•” ๊ด€๋ จ ๊ฒฝ๋กœ์— ๊ด€์—ฌํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์กŒ๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” ์•ฝ๋ฌผ ๋ฐ˜์‘์„ ์ผ์œผํ‚ค๋Š” ์œ ์ „์ž์˜ ๋‹ค๋Œ€์ผ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ์•ฝ๋ฌผ ๋ฐ˜์‘ ์˜ˆ์ธก์„ ์œ„ํ•ด์„œ ์•ฝ๋ฌผ ๋ฐ˜์‘ ๋งค๊ฐœ ์œ ์ „์ž๋ฅผ ๊ฒฐ์ •ํ•ด์•ผ ํ•˜๋ฉฐ ์ด๋ฅผ ์œ„ํ•ด 20,000๊ฐœ ์œ ์ „์ž์˜ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ฉ ๋ถ„์„ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ €์ฐจ์› ์ž„๋ฒ ๋”ฉ ๋ฐฉ๋ฒ•, ์•ฝ๋ฌผ-์œ ์ „์ž ์—ฐ๊ด€์„ฑ์— ๋Œ€ํ•œ ๋ฌธํ—Œ ์ง€์‹ ๋ฐ ์œ ์ „์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ์•ฝ๋ฌผ ๋ฐ˜์‘์„ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ DRIM์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. DRIM์€ ์˜คํ† ์ธ์ฝ”๋”, ํ…์„œ ๋ถ„ํ•ด, ์•ฝ๋ฌผ-์œ ์ „์ž ์—ฐ๊ด€์„ฑ์„ ์ด์šฉํ•˜์—ฌ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ์—์„œ ๋‹ค๋Œ€์ผ ๊ด€๊ณ„๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ๊ฒฐ์ •๋œ ๋งค๊ฐœ ์œ ์ „์ž์˜ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์œ ์ „์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹๊ณผ ์•ฝ๋ฌผ ๋ฐ˜์‘ ์‹œ๊ณ„์—ด ์œ ์ „์ž ๋ฐœํ˜„ ๋ฐ์ดํ„ฐ์˜ ์ƒํ˜ธ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฒฐ์ •ํ•œ๋‹ค. ์œ ๋ฐฉ์•” ์„ธํฌ์ฃผ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ DRIM์€ ๋ผํŒŒํ‹ฐ๋‹™์ด ํ‘œ์ ์œผ๋กœ ํ•˜๋Š” PI3K-Akt ํŒจ์Šค์›จ์ด์— ๊ด€์—ฌํ•˜๋Š” ์œ ์ „์ž๋“ค์˜ ์•ฝ๋ฌผ ๋ฐ˜์‘ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๊ณ  ๋ผํŒŒํ‹ฐ๋‹™ ๋ฐ˜์‘์„ฑ๊ณผ ๊ด€๋ จ๋œ ๋งค๊ฐœ ์œ ์ „์ž๋ฅผ ์˜ˆ์ธกํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์˜ˆ์ธก๋œ ์กฐ์ ˆ ๊ด€๊ณ„๊ฐ€ ์„ธํฌ์ฃผ ํŠน์ด์ ์ธ ํŒจํ„ด์„ ๋ณด์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์„ธ ๋ฒˆ์งธ๋Š” ์„ธํฌ์˜ ์ƒํƒœ๋ฅผ ์„ค๋ช…ํ•˜๋Š” ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž์˜ ๋‹ค๋Œ€๋‹ค ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„ ์˜ˆ์ธก์„ ์œ„ํ•ด ๊ด€์ฐฐ๋œ ์œ ์ „์ž ๋ฐœํ˜„ ๊ฐ’๊ณผ ์œ ์ „์ž ์กฐ์ ˆ ๋„คํŠธ์›Œํฌ๋กœ๋ถ€ํ„ฐ ์ถ”์ •๋œ ์œ ์ „์ž ๋ฐœํ˜„ ๊ฐ’ ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค. ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์กฐ์ ˆ์ธ์ž์™€ ์œ ์ „์ž์˜ ์ˆ˜์— ๋”ฐ๋ผ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒ€์ƒ‰ ๊ณต๊ฐ„์„ ํƒ์ƒ‰ํ•ด์•ผ ํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์กฐ์ ˆ์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ์—ฐ์‚ฐ์„ ๋ฐ˜๋ณตํ•˜์—ฌ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์ฐพ๋Š” ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋„คํŠธ์›Œํฌ์— ๊ฐ„์„ ์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ•ํ™” ํ•™์Šต ๊ธฐ๋ฐ˜ ํœด๋ฆฌ์Šคํ‹ฑ์„ ํ†ตํ•ด ์กฐ์ ˆ์ž๋ฅผ ์„ ํƒํ•˜๋Š” ๋‹ค๋Œ€์ผ ์œ ์ „์ž ์ค‘์‹ฌ ๊ด€๊ณ„๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค. ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋„คํŠธ์›Œํฌ์—์„œ ๊ฐ„์„ ์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด ์œ ์ „์ž๋ฅผ ํ™•๋ฅ ์ ์œผ๋กœ ์„ ํƒํ•˜๋Š” ์ผ๋Œ€๋‹ค ์กฐ์ ˆ์ž ์ค‘์‹ฌ ๊ด€๊ณ„๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค. ์œ ๋ฐฉ์•” ์„ธํฌ์ฃผ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ด์ „์˜ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ์ •ํ™•ํ•œ ์œ ์ „์ž ๋ฐœํ˜„๋Ÿ‰ ์ถ”์ •์„ ํ•˜์˜€๊ณ  ์กฐ์ ˆ์ž ๋ฐ ์œ ์ „์ž ๋ฐœํ˜„ ๋ฐ์ดํ„ฐ๋กœ ์œ ๋ฐฉ์•” ์•„ํ˜• ํŠน์ด์  ๋„คํŠธ์›Œํฌ๋ฅผ ๊ตฌ์„ฑํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์œ ๋ฐฉ์•” ์•„ํ˜• ๊ด€๋ จ ์‹คํ—˜ ๊ฒ€์ฆ๋œ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๋‹ค. ์š”์•ฝํ•˜๋ฉด, ๋ณธ ๋ฐ•์‚ฌํ•™์œ„ ๋…ผ๋ฌธ์€ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž์˜ ์‚ฌ์ด์˜ ์ผ๋Œ€๋‹ค, ๋‹ค๋Œ€์ผ, ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ƒ๋ฌผํ•™์  ์ง€์‹์„ ํ™œ์šฉํ•œ ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋Š” ๋ถ„์ž ์ƒ๋ฌผํ•™ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์œ ์ „์ž ์กฐ์ ˆ ์ƒํ˜ธ ์ž‘์šฉ์„ ์ดํ•ดํ•จ์œผ๋กœ์จ ์„ธํฌ ๊ธฐ๋Šฅ์— ๋Œ€ํ•œ ์‹ฌ์ธต์ ์ธ ์ดํ•ด๋ฅผ ๋„์™€์ค„ ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.Understanding how cells function or respond to external stimuli is one of the most important questions in biology and medicine. Thanks to the advances in instrumental technologies, scientists can routinely measure events within cells in single biological experiments. Notable examples are multi-omics data: sequencing of genomes, quantifications of gene expression, and identification of epigenetic events that regulate expression of genes. In order to better understand cellular mechanisms, it is essential to identify regulatory relationships between multi-omics regulators and genes. However, regulatory relationships are very complex and it is infeasible to validate all condition-specific relationships experimentally. Thus, there is an urgent need for an efficient computational method to extract relationships from different types of high-dimensional omics data. One way to address these high-dimensional data is to incorporate external biological knowledge such as relationships between omics and functions of genes curated in various databases. In my doctoral study, I developed three computational approaches to identify the regulatory relationships from multi-omics data utilizing biological prior knowledge. The first study proposes a method to predict one-to-m relationships between miRNA and genes. The computational challenge of miRNA target prediction is that there are many miRNA target candidates, and the ratio of false positives to false negatives needs to be adjusted. This challenge is addressed by utilizing literature knowledge for determining the association between miRNA-gene and a given context. In this study, I developed ContextMMIA to predict miRNA-gene relationships from miRNA and gene expression data. ContextMMIA computes scores of miRNA-gene relationships based on statistical significance and literature relevance and prioritizes the relationships based on the scores. In experiments on breast cancer data with different prognosis, ContextMMIA predicted differentially activated miRNA-gene relationships in invasive breast cancer. The experimentally verified miRNA-gene relationships were predicted with high priority and those genes are known to be involved in breast cancer-related pathways. The second study proposes a method to predict n-to-one relationships between regulators and gene on drug response. The computational challenge of drug response prediction is how to integrate multi-omics data of 20,000 genes for determining drug response mediator genes. This challenge is addressed by utilizing low-dimensional embedding methods, literature knowledge of drug-gene associations, and gene-gene interaction knowledge. For this problem, I developed DRIM to predict drug response relationships from the multi-omics data and drug-induced time-series gene expression data. DRIM uses autoencoder, tensor decomposition, and drug-gene association to determine n-to-one relationships from multi-omics data. Then, regulatory relationships of mediator genes are determined by gene-gene interaction knowledge and cross-correlation of drug-induced time-series gene expression data. In experiments on breast cancer cell line data, DRIM extracted mediator genes relevant to drug response and regulatory relationships of genes involved in the PI3K-Akt pathway targeted by lapatinib. In addition, DRIM revealed distinguished patterns of relationships in breast cancer cell lines with different lapatinib resistance. The third study proposes a method to predict n-to-m relationships between regulators and genes. In order to predict n-to-m relationships, this study formulated an objective function that measures the deviation between observed gene expression values and estimated gene expression values derived from gene regulatory networks. The computational challenge of minimizing the objective function is to navigate the search space of relationships exponentially increasing according to the number of regulators and genes. This challenge is addressed by the iterative local optimization with regulator-gene interaction knowledge. In this study, I developed a two-step iterative RL-based method to predict n-to-m relationships from regulator and gene expression data. The first step is to explore the n-to-one gene-oriented step that selects regulators by reinforcement learning based heuristic to add edges to the network. The second step is to explore the one-to-m regulator-oriented step that stochastically selects genes to remove edges from the network. In experiments on breast cancer cell line data, the proposed method constructed breast cancer subtype-specific networks from the regulator and gene expression profiles with a more accurate gene expression estimation than previous combinatorial optimization methods. Moreover, regulatory relationships involved in the networks were associated with breast cancer subtypes. In summary, in this thesis, I proposed computational methods for predicting one-to-m, n-to-one, and n-to-m relationships between multi-omics regulators and genes utilizing external domain knowledge. The proposed methods are expected to deepen our knowledge of cellular mechanisms by understanding gene regulatory interactions by analyzing the ever-increasing molecular biology data such as The Cancer Genome Atlas, Cancer Cell Line Encyclopedia.Chapter 1 Introduction 1 1.1 Biological background 1 1.1.1 Multi-omics analysis 1 1.1.2 Multi-omics relationships indicating cell state 2 1.1.3 Biological prior knowledge 4 1.2 Research problems for the multi-omics relationship 6 1.3 Computational challenges and approaches in the exploring multiomics relationship 6 1.4 Outline of the thesis 12 Chapter 2 Literature-based condition-specific miRNA-mRNA target prediction 13 2.1 Computational Problem & Evaluation criterion 14 2.2 Related works 15 2.3 Motivation 17 2.4 Methods 20 2.4.1 Identifying genes and miRNAs based on the user-provided context 22 2.4.2 Omics Score 23 2.4.3 Context Score 24 2.4.4 Confidence Score 26 2.5 Results 26 2.5.1 Pathway analysis 27 2.5.2 Reproducibility of validated targets in humans 31 2.5.3 Sensitivity tests when different keywords are used 33 2.6 Summary 34 Chapter 3 DRIM: A web-based system for investigating drug response at the molecular level by condition-specific multi-omics data integration 36 3.1 Computational Problem & Evaluation criterion 37 3.2 Related works 38 3.3 Motivation 42 3.4 Methods 44 3.4.1 Step 1: Input 45 3.4.2 Step 2: Identifying perturbed sub-pathway with time-series 45 3.4.3 Step 3: Embedding multi-omics for selecting potential mediator genes 47 3.4.4 Step 4: Construct TF-regulatory time-bounded network and identify regulatory path 52 3.4.5 Step 5: Analysis result on the web 52 3.5 Case study: Comparative analysis of breast cancer cell lines that have different sensitivity with lapatinib 54 3.5.1 Multi-omics analysis result before drug treatment 56 3.5.2 Time-series gene expression analysis after drug treatment 57 3.6 Summary 61 Chapter 4 Combinatorial modeling and optimization using iterative RL search for inferring sample-specific regulatory network 63 4.1 Computational Problem & Evaluation criterion 64 4.2 Related works 64 4.3 Motivation 66 4.4 Methods 68 4.4.1 Formulating an objective function 68 4.4.2 Overview of an iterative search method 70 4.4.3 G-step for exploring n-to-one gene-oriented relationship 73 4.4.4 R-step for exploring one-to-m regulator-oriented relationship 79 4.5 Results 80 4.5.1 Cancer cell line data 80 4.5.2 Hyperparameters 81 4.5.3 Quantitative evaluation 82 4.5.4 Qualitative evaluation 83 4.6 Summary 86 Chapter 5 Conclusions 88 ๊ตญ๋ฌธ์ดˆ๋ก 111๋ฐ•
    • โ€ฆ
    corecore