2,111 research outputs found

    KEGG spider: interpretation of genomics data in the context of the global gene metabolic network

    Get PDF
    KEGG spider is a web-based tool for interpretation of experimentally derived gene lists in order to gain understanding of metabolism variations at a genomic level. KEGG spider implements a 'pathway-free' framework that overcomes a major bottleneck of enrichment analyses: it provides global models uniting genes from different metabolic pathways. Analyzing a number of experimentally derived gene lists, we demonstrate that KEGG spider provides deeper insights into metabolism variations in comparison to existing methods

    KEGG spider: interpretation of genomics data in the context of the global gene metabolic network

    Get PDF
    KEGG spider is a web-based tool for interpretation of experimentally derived gene lists in order to gain understanding of metabolism variations at a genomic level. KEGG spider implements a 'pathway-free' framework that overcomes a major bottleneck of enrichment analyses: it provides global models uniting genes from different metabolic pathways. Analyzing a number of experimentally derived gene lists, we demonstrate that KEGG spider provides deeper insights into metabolism variations in comparison to existing methods

    eXamine: a Cytoscape app for exploring annotated modules in networks

    Get PDF
    Background. Biological networks have growing importance for the interpretation of high-throughput "omics" data. Statistical and combinatorial methods allow to obtain mechanistic insights through the extraction of smaller subnetwork modules. Further enrichment analyses provide set-based annotations of these modules. Results. We present eXamine, a set-oriented visual analysis approach for annotated modules that displays set membership as contours on top of a node-link layout. Our approach extends upon Self Organizing Maps to simultaneously lay out nodes, links, and set contours. Conclusions. We implemented eXamine as a freely available Cytoscape app. Using eXamine we study a module that is activated by the virally-encoded G-protein coupled receptor US28 and formulate a novel hypothesis about its functioning

    Chemical and genomic evolution of enzyme-catalyzed reaction networks.

    Get PDF
    There is a tendency that a unit of enzyme genes in an operon-like structure in the prokaryotic genome encodes enzymes that catalyze a series of consecutive reactions in a metabolic pathway. Our recent analysis shows that this and other genomic units correspond to chemical units reflecting chemical logic of organic reactions. From all known metabolic pathways in the KEGG database we identified chemical units, called reaction modules, as the conserved sequences of chemical structure transformation patterns of small molecules. The extracted patterns suggest co-evolution of genomic units and chemical units. While the core of the metabolic network may have evolved with mechanisms involving individual enzymes and reactions, its extension may have been driven by modular units of enzymes and reactions

    BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation

    Get PDF
    We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at http://www.biograph.be

    RNA-seq ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ํŒจ์Šค์›จ์ด ํ™œ์„ฑ๋„์˜ ์ •๋Ÿ‰ํ™”์— ๊ด€ํ•œ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์ƒ๋ฌผ์ •๋ณดํ•™์ „๊ณต,2019. 8. ๊น€์„ .RNA-seq ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ RNA ์ „์‚ฌ์ฒด์˜ ๋ณ€ํ™”๋Ÿ‰์„ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์€ ์ƒ๋ฌผ์ •๋ณดํ•™ ๋ถ„์•ผ์—์„œ ํ•„์ˆ˜์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ณ  ์žˆ๋Š” ๋ถ„์„ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ RNA-seq์€ ์ธ๊ฐ„์˜ 2๋งŒ๊ฐœ ์ด์ƒ์˜ ์œ ์ „์ž๋ฅผ ํฌํ•จํ•˜๋Š” ๊ณ ์ฐจ์›์˜ ์ „์‚ฌ์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ƒ๋Œ€์ ์œผ๋กœ ์ ์€ ์–‘์˜ ์ƒ˜ํ”Œ๋“ค์„ ๋ถ„์„ํ•˜๊ณ ์ž ํ• ๋•Œ๋Š” ๋ฐ์ดํ„ฐ ํ•ด์„์— ์žˆ์–ด์„œ ์–ด๋ ค์›€์ด ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ, ๋” ๋‚˜์€ ์ƒ๋ฌผํ•™์  ์ดํ•ด๋ฅผ ์œ„ํ•ด์„œ๋Š” ์ƒ๋ฌผํ•™์  ํŒจ์Šค์›จ์ด์™€ ๊ฐ™์ด ์ž˜ ์š”์•ฝ๋˜๊ณ  ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์œ ์šฉํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ „์‚ฌ์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ๋ฌผํ•™์  ํŒจ์Šค์›จ์ด๋กœ ์š”์•ฝํ•˜๋Š” ๊ฒƒ์€ ๋ช‡ ๊ฐ€์ง€ ์ด์œ ๋กœ ๋งค์šฐ ์–ด๋ ค์šด ์ž‘์—…์ด๋‹ค. ์ฒซ์งธ, ์ „์‚ฌ์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ํŒจ์Šค์›จ์ด ์ฐจ์›์œผ๋กœ ๋ณ€ํ™˜ํ•  ๋•Œ ์—„์ฒญ๋‚œ ์ •๋ณด ์†์‹ค์ด ๋ฐœ์ƒํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ธ๊ฐ„์— ์กด์žฌํ•˜๋Š” ์ „์ฒด ์œ ์ „์ž์˜ 1/3๋งŒ์ด KEGG ํŒจ์Šค์›จ์ด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ๋ณด๊ณ ๋˜๊ณ  ์žˆ๋‹ค. ๋‘˜์งธ, ๊ฐ ํŒจ์Šค์›จ์ด๋Š” ๋งŽ์€ ์œ ์ „์ž๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ํŒจ์Šค์›จ์ด์˜ ํ™œ์„ฑ๋„๋ฅผ ์ธก์ •ํ•˜๋ ค๋ฉด ๊ตฌ์„ฑํ•˜๊ณ  ์žˆ๋Š” ์œ ์ „์ž ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๊ณ ๋ คํ•˜๋ฉด์„œ ์œ ์ „์ž ๋ฐœํ˜„ ๊ฐ’์„ ๋‹จ์ผ ๊ฐ’์œผ๋กœ ์š”์•ฝํ•ด์•ผ ํ•œ๋‹ค. ๋ณธ ๋ฐ•์‚ฌ ํ•™์œ„ ๋…ผ๋ฌธ์€ ํŒจ์Šค์›จ์ด ํ™œ์„ฑ๋„ ์ธก์ •์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜๊ณ  ์—ฌ๋Ÿฌ ๋น„๊ต ๊ธฐ์ค€์— ๋”ฐ๋ผ ๊ธฐ์กด์— ๋ณด๊ณ ๋œ ํŒจ์Šค์›จ์ด ํ™œ์„ฑ๋„ ๋„๊ตฌ๋“ค์— ๋Œ€ํ•œ ๊ด‘๋ฒ”์œ„ํ•œ ํ‰๊ฐ€ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•˜๊ณ ์ž ํ•œ๋‹ค. ๋˜ํ•œ ์ผ๋ฐ˜ ์‚ฌ์šฉ์ž๊ฐ€ ์ž์‹ ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‰ฝ๊ฒŒ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋„๋ก ์•ž์„œ ์–ธ๊ธ‰ํ•œ ๋„๊ตฌ๋“ค์„ ์›น ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ ๊ตฌ์ถ•์„ ํ†ตํ•ด ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ „์‚ฌ์ฒด ์œ ์ „์ž ๋ฐœํ˜„์–‘ ์ •๋ณด๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•˜๊ณ , ์ƒํ˜ธ์ž‘์šฉ ๋„คํŠธ์›Œํฌ ์ธก๋ฉด์—์„œ ์œ ์ „์ž ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ํŒจ์Šค์›จ์ด์˜ ๊ด€์ ์œผ๋กœ ์ „์‚ฌ์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ์š”์•ฝํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์ด ์—ฐ๊ตฌ์—์„œ๋Š” ๋‹จ๋ฐฑ์งˆ ์ƒํ˜ธ ์ž‘์šฉ ๋„คํŠธ์›Œํฌ, ํŒจ์Šค์›จ์ด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋ฐ RNA-seq ์ „์‚ฌ์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ƒ๋ฌผํ•™์  ํŒจ์Šค์›จ์ด๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์‹œ์Šคํ…œ์œผ๋กœ ๊ตฌ๋ถ„ํ•˜๋Š” ์ƒˆ๋กœ์šด ๊ฐœ๋…์„ ์ œ์•ˆํ•˜๊ณ ์ž ํ•œ๋‹ค. ๊ฐ ์‹œ์Šคํ…œ ๋ฐ ๊ฐ ์ƒ˜ํ”Œ๋งˆ๋‹ค์˜ ํ™œ์„ฑํ™” ์ •๋„๋ฅผ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•ด SAS (Subsystem Activation Score)๋ฅผ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์ƒ˜ํ”Œ ๋“ค๊ฐ„ ๋ฐ ์œ ๋ฐฉ์•” ์•„ํ˜•๋“ค ์‚ฌ์ด์—์„œ ์ฐจ๋ณ„์ ์œผ๋กœ ํ™œ์„ฑํ™”๋˜๋Š” ํŠน์œ ์˜ ์œ ์ „์ฒด ์ƒ์—์„œ์˜ ํ™œ์„ฑํ™” ํŒจํ„ด ๋˜๋Š” ์„œ๋ธŒ ์‹œ์Šคํ…œ์„ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ, ๋ถ„๋ฅ˜ ๋ฐ ํšŒ๊ท€ ํŠธ๋ฆฌ (CART) ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ์˜ˆํ›„ ๋ชจ๋ธ๋ง์„ ์œ„ํ•ด SAS ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, 10 ๊ฐœ์˜ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ํ•˜์œ„ ์‹œ์Šคํ…œ์œผ๋กœ ์ •์˜ ๋œ 11 ๊ฐœ์˜ ํ™˜์ž ํ•˜์œ„ ๊ทธ๋ฃน์€ ์ƒ์กด ๊ฒฐ๊ณผ์— ์žˆ์–ด ์ตœ๋Œ€ ๋ถˆ์ผ์น˜๋กœ ํ™•์ธ๋˜์—ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ์œ ์‚ฌํ•œ ์ƒ์กด ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ง„ ํ™˜์ž ํ•˜์œ„ ๊ทธ๋ฃน์„ ์ •์˜ํ–ˆ์„๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ธฐ๋Šฅ์ ์œผ๋กœ ์œ ์ตํ•œ ์œ ๋ฐฉ์•” ์œ ์ „์ž ์„ธํŠธ๋ฅผ ์ œ์•ˆํ•˜๋Š” ํ•˜์œ„ ์‹œ์Šคํ…œ์˜ ํ™œ์„ฑํ™” ์ƒํƒœ์— ๋”ฐ๋ผ ๊ฒฐ์ •๋˜๋Š” ์ƒ˜ํ”Œ ํŠน์ด์ ์ธ ์ƒํƒœ์˜ ํŒ๋‹จ ๊ฒฝ๋กœ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์ „ ์•” (pan-cancer) ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์„ฏ ๊ฐ€์ง€ ๋น„๊ต ๊ธฐ์ค€์— ๋”ฐ๋ผ 13 ๊ฐ€์ง€์˜ ํŒจ์Šค์›จ์ด ํ™œ์„ฑ๋„ ์ธก์ • ๋„๊ตฌ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ๋น„๊ต ๋ฐ ํ‰๊ฐ€ํ•˜๋Š” ์—ฐ๊ตฌ์ด๋‹ค.ํ˜„์กดํ•˜๋Š” ํŒจ์Šค์›จ์ด ํ™œ์„ฑ๋„ ์ธก์ • ๋„๊ตฌ๊ฐ€ ๋งŽ์ด ์žˆ์ง€๋งŒ, ์ด๋Ÿฌํ•œ ๋„๊ตฌ๊ฐ€ ์ฝ”ํ˜ธํŠธ ์ˆ˜์ค€์—์„œ ์œ ์šฉํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ๋น„๊ต ์—ฐ๊ตฌ๋Š” ์—†๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ ์˜๋ฏธ๊ฐ€ ์žˆ๋‹ค. ์ฒซ์งธ, ์ด ์—ฐ๊ตฌ๋Š” ๊ธฐ์กด์˜ ํŒจ์Šค์›จ์ด ํ™œ์„ฑ๋„ ์ธก์ • ๋„๊ตฌ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๊ณ„์‚ฐ ๊ธฐ๋ฒ•์— ๋Œ€ํ•œ ํฌ๊ด„์ ์ธ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ํŒจ์Šค์›จ์ด ํ™œ์„ฑ๋„ ์ธก์ •์€ ๋‹ค์–‘ํ•œ ์ ‘๊ทผ๋ฒ•์„ ์‚ฌ์šฉํ•˜๊ณ , ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๋ณ€ํ™˜, ์ƒ˜ํ”Œ ์ •๋ณด์˜ ์‚ฌ์šฉ, ์ฝ”ํ˜ธํŠธ ์ˆ˜์ค€์˜ ์ธํ’‹ ๋ฐ์ดํ„ฐ์˜ ํ•„์š”์„ฑ, ์œ ์ „์ž ๊ด€๊ณ„ ๋ฐ ์ ์ˆ˜์ฒด๊ณ„์˜ ์‚ฌ์šฉ ๋“ฑ์—์„œ ๋‹ค์–‘ํ•œ ์š”๊ตฌ ์‚ฌํ•ญ์„ ๊ฐ€์ •ํ•ด์•ผ ํ•œ๋‹ค. ๋‘˜์งธ, ์ด๋Ÿฌํ•œ ๋„๊ตฌ์˜ ์„ฑ๋Šฅ์— ๋Œ€ํ•œ ๋‹ค์„ฏ ๊ฐ€์ง€ ๋น„๊ต ๊ธฐ์ค€์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ด‘๋ฒ”์œ„ํ•œ ํ‰๊ฐ€๊ฐ€ ์ˆ˜ํ–‰๋˜์—ˆ๋‹ค. ๋„๊ตฌ๊ฐ€ ์›๋ž˜์˜ ์œ ์ „์ž ๋ฐœํ˜„ ํ”„๋กœํŒŒ์ผ์˜ ํŠน์„ฑ์„ ์–ผ๋งˆ๋‚˜ ์ž˜ ์œ ์ง€ํ•˜๋Š”์ง€๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒƒ๋ถ€ํ„ฐ, ์œ ์ „์ž ๋ฐœํ˜„ ๋ฐ์ดํ„ฐ์— ๋…ธ์ด์ฆˆ๋ฅผ ์ž„์˜๋กœ ๋„์ž…ํ•˜์˜€์„ ๋•Œ ์–ผ๋งˆ๋‚˜ ๋‘”๊ฐํ•œ์ง€ ๋“ฑ์„ ์กฐ์‚ฌํ–ˆ๋‹ค. ์ž„์ƒ ์ ์šฉ์„ ์œ„ํ•œ ๋„๊ตฌ์˜ ์œ ์šฉ์„ฑ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์„ธ๊ฐ€์ง€ ๋ณ€์ˆ˜ (์ข…์–‘ ๋Œ€ ์ •์ƒ, ์ƒ์กด ๋ฐ ์•”์˜ ์•„ํ˜•)์— ๋Œ€ํ•œ ๋ถ„๋ฅ˜ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ–ˆ๋‹ค. ์„ธ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ์ „์‚ฌ์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•˜๊ณ , ์•ž์„  ์—ฐ๊ตฌ์—์„œ ๋น„๊ตํ•œ ํ™œ์„ฑ๋„ ์ธก์ • ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŒจ์Šค์›จ์ด ํ™œ์„ฑ๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ํด๋ผ์šฐ๋“œ ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ (PathwayCloud)์„ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์‚ฌ์šฉ์ž๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ์Šคํ…œ์— ์—…๋กœ๋“œํ•˜๊ณ  ์‹คํ–‰ํ•  ๋ถ„์„ ๋„๊ตฌ๋ฅผ ์„ ํƒํ•˜๋ฉด, ์ด ์‹œ์Šคํ…œ์€ ๊ฐ ๋„๊ตฌ์— ๋Œ€ํ•œ ํŒจ์Šค์›จ์ด ํ™œ์„ฑ๋„ ๊ฐ’๊ณผ ์„ ํƒํ•œ ๋„๊ตฌ์— ๋Œ€ํ•œ ์„ฑ๋Šฅ ๋น„๊ต ์š”์•ฝ์„ ์ž๋™์œผ๋กœ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์‚ฌ์šฉ์ž๋Š” ๋˜ํ•œ ์ฃผ์–ด์ง„ ์ƒ˜ํ”Œ ์ •๋ณด์˜ ์ธก๋ฉด์—์„œ ์–ด๋–ค ํŒจ์Šค์›จ์ด๊ฐ€ ์ค‘์š”ํ•œ์ง€ ์กฐ์‚ฌ ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, KEGG rest API๋ฅผ ํ†ตํ•ด์„œ ์ง์ ‘ ํŒจ์Šค์›จ์ด์˜ ์–ด๋–ค ์œ ์ „์ž์˜ ๋ณ€ํ™”๊ฐ€ ์œ ์˜๋ฏธํ•œ์ง€๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ฒฐ๋ก ์ ์œผ๋กœ, ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ๊ณ ์šฉ๋Ÿ‰์˜ ์œ ์ „์ž ๋ฐœํ˜„ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ๋ฌผํ•™์  ํŒจ์Šค์›จ์ด์— ๋Œ€ํ•œ ๋ถ„์„ ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜๊ณ , ๋‹ค๋ฅธ ์œ ํ˜•์˜ ๋„๊ตฌ๋ฅผ ํฌ๊ด„์ ์ธ ๊ธฐ์ค€์œผ๋กœ ๋น„๊ตํ•˜๊ณ , ์‚ฌ์šฉ์ž๊ฐ€ ์ด ๋„๊ตฌ๋“ค์— ์‰ฝ๊ฒŒ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋Š” ์›น ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ด ์ „๋ฐ˜์ ์ธ ์ ‘๊ทผ ๋ฐฉ์‹์€ ์ƒ๋ฌผํ•™์  ํŒจ์Šค์›จ์ด ์ธก๋ฉด์—์„œ ์œ ์ „์ž ๋ฐœํ˜„ ๋ฐ์ดํ„ฐ๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ์ค‘์š”ํ–ˆ๋‹ค.Measuring the dynamics of RNA transcripts using RNA-seq data has become routine in bioinformatics analyses. However, RNA-seq produces high-dimensional transcriptome data on more than 20,000 genes in humans. This makes the interpretation of the data extremely difficult given a relatively small set of samples. Therefore, it is desirable to use well-summarized and widely-used information such as biological pathways for better biological comprehension. However, summarizing transcriptome data in terms of biological pathways is a very challenging task for several reasons. First, there is a huge information loss when transforming transcriptome data to pathway space. For example, in humans, only one third of the entire set of genes being analyzed are present in KEGG pathways. Second, each pathway consists of many genes; thus, measuring pathway activity requires a strategy to summarize expression profiles of component genes into a single value, while considering relationship among the constituent genes. My doctoral study aimed to develop a new method for pathway activity measurement, and to perform extensive evaluation experiments on existing pathway measurement tools in terms of multiple evaluation criteria. In addition, a cloud-based system was constructed to deploy such tools, which facilitates users analyzing their own data easily. The first study is to develop a new method to summarize transcriptome data in terms of pathways by using explicit transcript quantity information and considering relationship among genes in terms of their interactions. In this study, I propose a novel concept of decomposing biological pathways into subsystems by utilizing protein interaction network, pathway information, and RNA-seq data. A subsystem activation score (SAS) was designed to measure the degree of activation for each subsystem and each patient. This method revealed distinctive genome-wide activation patterns or landscapes of subsystems that are differentially activated among samples as well as among breast cancer subtypes. Next, we used SAS information for prognostic modeling by classification and regression tree (CART) analysis. Eleven subgroups of patients, defined by the 10 most significant subsystems, were identified with maximal discrepancy in survival outcome. Our model not only defined patient subgroups with similar survival outcomes, but also provided patient-specific decision paths determined by SAS status, suggesting functionally informative gene sets in breast cancer. The second study aimed to systematically compare and evaluate thirteen different pathway activity inference tools based on five comparison criteria using a pan-cancer data set. Although many pathway activity tools are available, there is no comparative study on how effective these tools are in producing useful information at the cohort level, enabling comparison of many samples. This study has two major contributions. First, this study provides a comprehensive survey on computational techniques used by existing pathway activity inference tools. Existing tools use different strategies and assume different requirements on data: input transformation, use of labels, necessity of cohort-level input data, use of gene relations and scoring metrics. Second, extensive evaluations were conducted using five comparison criteria concerning the performance of these tools. Starting from measuring how well a tool maintains the characteristics of an original gene expression profile, robustness was also investigated by introducing noise into gene expression data. Classification tasks on three clinical variables were performed to evaluate the utility of tools. The third study is to build a cloud-based system where a user provides transcriptome data and measures pathway activities using the tools that were used for the comparative study. When a user uploads input data to the system and selects which preferred analysis tools are to be run, the system automatically generates pathway activity values for each tool as well as a summary of performance comparison for the selected tools. Users can also investigate which pathways are significant in terms of the given sample information and visually inspect genes within a pathway-linked KEGG rest API. In conclusion, in my thesis, I sought to develop an analysis method regarding biological pathways using high throughput gene expression data to compare different types of tools with comprehensive criteria, and to arrange the tools in a cloud-based system that is easily accessible. As pathways aggregate various molecular events among genes in to a single entity, the set of suggested approaches will aid interpretation of high-throughput data as well as facilitate integration of diverse data layers such as miRNA or DNA methylation profiles being taken into consideration.Chapter 1 Introduction 1 1.1 Biological background . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Biological pathways . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Gene expression . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.3 Pathway-based analysis . . . . . . . . . . . . . . . . . . . 7 1.1.4 Pathway activity measurement . . . . . . . . . . . . . . . 8 1.2 Challenges in pathway activity measurement . . . . . . . . . . . 9 1.2.1 Calculating effective pathway activity values from RNAseq data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.2 Lack of comparative criteria to evaluate pathway activity tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.3 Absence of a user-friendly environment of pathway activity inference tools . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 2 Measuring pathway activity from RNA-seq data to identify breast cancer subsystems using protein-protein interaction network 14 2.1 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.1 Breast cancer subsystems . . . . . . . . . . . . . . . . . . 20 2.3.2 Subsystem Activation Score . . . . . . . . . . . . . . . . . 22 2.3.3 Prognostic modeling . . . . . . . . . . . . . . . . . . . . . 23 2.3.4 Hierarchical clustering of patients and subsystems . . . . 24 2.3.5 Tools used in this study . . . . . . . . . . . . . . . . . . . 25 2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.1 Pathways were decomposed into coherent functional units - subsystems . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.2 Landscape of subsystems reflect the breast cancer biology 26 2.4.3 SAS revealed patient clusters associated with PAM50 subtypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.4 Prognostic modeling by subsystems showed 11 patient subgroups with distinct survival outcome . . . . . . . . . 31 2.4.5 Relapse rate and CNVs were enriched to worse prognostic subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Chapter 3 Comprehensive evaluation of pathway activity measurement tools on pan-cancer data 40 3.1 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.1 Pathway activity inference Tools . . . . . . . . . . . . . . 45 3.3.2 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3.3 Pathway database . . . . . . . . . . . . . . . . . . . . . . 47 3.3.4 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.4 Comparative approach . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.1 Radar chart criteria . . . . . . . . . . . . . . . . . . . . . 49 3.4.2 Similarity among the tools . . . . . . . . . . . . . . . . . . 53 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.5.1 Distance preservation . . . . . . . . . . . . . . . . . . . . 53 3.5.2 Robustness against noise . . . . . . . . . . . . . . . . . . . 57 3.5.3 Classification: Tumor vs Normal . . . . . . . . . . . . . . 60 3.5.4 Classification: survival information . . . . . . . . . . . . . 62 3.5.5 Classification: cancer subtypes . . . . . . . . . . . . . . . 63 3.5.6 Similarity among the tools . . . . . . . . . . . . . . . . . . 63 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Chapter 4 A cloud-based system of pathway activity inference tools using high-throughput gene expression data 68 4.1 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.4.1 Calculating pathway activity values . . . . . . . . . . . . 71 4.4.2 Identification of significant pathways . . . . . . . . . . . . 72 4.4.3 Visualization in KEGG pathways . . . . . . . . . . . . . . 72 4.4.4 Comparison of the tools . . . . . . . . . . . . . . . . . . . 75 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Chapter 5 Conclusion 77 ์ดˆ๋ก 101Docto
    • โ€ฆ
    corecore